San Francisco, California, United States
Contact Info
5K followers
500+ connections
About
Articles by Pete
-
O'Reilly Bot Podcast: Talking about talking to machines
O'Reilly Bot Podcast: Talking about talking to machines
By Pete Skomoroch
Activity
-
Dr. Hilary Parker is an independent consultant and coach based in San Francisco, often referred to as the heart of “Cerebral Valley” for its rich…
Dr. Hilary Parker is an independent consultant and coach based in San Francisco, often referred to as the heart of “Cerebral Valley” for its rich…
Liked by Pete Skomoroch
-
Proud that Weights & Biases was honored as part of the Redpoint InfraRed 100! Great to back in NYC at the Nasdaq! You can read more about the…
Proud that Weights & Biases was honored as part of the Redpoint InfraRed 100! Great to back in NYC at the Nasdaq! You can read more about the…
Liked by Pete Skomoroch
Experience & Education
Licenses & Certifications
Publications
-
Pangloss: Fast Entity Linking in Noisy Text Environments
KDD '18 Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Entity linking is the task of mapping potentially ambiguous terms in text to their constituent entities in a knowledge base like Wikipedia. This is useful for organizing content, extracting structured data from textual documents, and in machine learning relevance applications like semantic search, knowledge graph construction, and question answering. Traditionally, this work has focused on text that has been well-formed, like news articles, but in common real world datasets such as messaging…
Entity linking is the task of mapping potentially ambiguous terms in text to their constituent entities in a knowledge base like Wikipedia. This is useful for organizing content, extracting structured data from textual documents, and in machine learning relevance applications like semantic search, knowledge graph construction, and question answering. Traditionally, this work has focused on text that has been well-formed, like news articles, but in common real world datasets such as messaging, resumes, or short-form social media, non-grammatical, loosely-structured text adds a new dimension to this problem. This paper presents Pangloss, a production system for entity disambiguation on noisy text. Pangloss combines a probabilistic linear-time key phrase identification algorithm with a semantic similarity engine based on context-dependent document embeddings to achieve better than state-of-the-art results (>5% in F1) compared to other research or commercially available systems. In addition, Pangloss leverages a local embedded database with a tiered architecture to house its statistics and metadata, which allows rapid disambiguation in streaming contexts and on-device disambiguation in low-memory environments such as mobile phones.
Other authorsSee publication -
LinkedIn Skills: Large-Scale Topic Extraction and Inference
RecSys Proceedings
"Skills and Expertise" is a data-driven feature on LinkedIn, the world's largest professional online social network, which allows members to tag themselves with topics representing their areas of expertise. In this work, we present our experiences developing this large-scale topic extraction pipeline, which includes constructing a folksonomy of skills and expertise and implementing an inference and recommender system for skills. We also discuss a consequent set of applications, such as…
"Skills and Expertise" is a data-driven feature on LinkedIn, the world's largest professional online social network, which allows members to tag themselves with topics representing their areas of expertise. In this work, we present our experiences developing this large-scale topic extraction pipeline, which includes constructing a folksonomy of skills and expertise and implementing an inference and recommender system for skills. We also discuss a consequent set of applications, such as Endorsements, which allows members to tag themselves with topics representing their areas of expertise and for their connections to provide social proof, via an "endorse" action, of that member's competence in that topic.
Other authors -
Large-Scale Hierarchical Topic Models
NIPS Workshop on Big Learning - 2012
In the past decade, a number of advances in topic modeling have produced sophisticated models that are capable of generating hierarchies of topics. One challenge for these models is scalability: they are incapable of working at the massive scale of millions of documents and hundreds of thousands of terms. We address this challenge with a technique that learns a hierarchy of topics by iteratively applying topic models and processing subtrees of the hierarchy in parallel. This approach has a…
In the past decade, a number of advances in topic modeling have produced sophisticated models that are capable of generating hierarchies of topics. One challenge for these models is scalability: they are incapable of working at the massive scale of millions of documents and hundreds of thousands of terms. We address this challenge with a technique that learns a hierarchy of topics by iteratively applying topic models and processing subtrees of the hierarchy in parallel. This approach has a number of scalability advantages compared to existing techniques, and shows promising results in experiments assessing runtime and human evaluations of quality. We detail extensions to this approach that may further improve hierarchical topic modeling for large-scale applications.
Other authorsSee publication -
Multisensor data analysis and aerosol background characterization
Proc. SPIE 6218, Chemical and Biological Sensing VII
A portable and extensible multisensor testbed for long-term multi-point aerosol background data collections has been developed. The primary objective of the testbed is to support investigations related to the information fusion, machine-intelligence based CB decision support architectrure, now under development at MIT Lincoln Laboratory. This paper describes major design features of the testbed and concentrates on the analysis and the results of multiple indoor data collections. Specifically…
A portable and extensible multisensor testbed for long-term multi-point aerosol background data collections has been developed. The primary objective of the testbed is to support investigations related to the information fusion, machine-intelligence based CB decision support architectrure, now under development at MIT Lincoln Laboratory. This paper describes major design features of the testbed and concentrates on the analysis and the results of multiple indoor data collections. Specifically, two deployments of the testbed for extensive indoor data collection campaigns are described. The indoor background characterization results are presented.
Other authorsSee publication -
Information fusion and uncertainty management for biological multisensor systems
Proc. SPIE 5813, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2005
This paper investigates methods of decision-making from uncertain and disparate data. The need for such methods arises in those sensing application areas in which multiple and diverse sensing modalities are available, but the information provided can be imprecise or only indirectly related to the effects to be discerned. Biological sensing for biodefense is an important instance of such applications. Information fusion in that context is the focus of a research program now underway at MIT…
This paper investigates methods of decision-making from uncertain and disparate data. The need for such methods arises in those sensing application areas in which multiple and diverse sensing modalities are available, but the information provided can be imprecise or only indirectly related to the effects to be discerned. Biological sensing for biodefense is an important instance of such applications. Information fusion in that context is the focus of a research program now underway at MIT Lincoln Laboratory. The paper outlines a multi-level, multi-classifier recognition architecture developed within this program, and discusses its components. Information source uncertainty is quantified and exploited for improving the quality of data that constitute the input to the classification processes. Several methods of sensor uncertainty exploitation at the feature-level are proposed and their efficacy is investigated. Other aspects of the program are discussed as well. While the primary focus of the paper is on biodefense, the applicability of concepts and techniques presented here extends to other multisensor fusion application domains.
Other authorsSee publication
Patents
-
Skill Extraction System
Issued US US8650177
-
Skills ontology creation
Filed US US9697472B2
Disclosed in some examples are systems, methods, and machine readable mediums which allow for the automatic creation of a skills hierarchy. The skills hierarchy comprises an organization of a standardized list of skills into a hierarchy that describes category relationships between the skills in the hierarchy. The category relationships may include no relationships, parent relationships, and child relationships. A skill may be considered a parent of another skill if the parent skill describes a…
Disclosed in some examples are systems, methods, and machine readable mediums which allow for the automatic creation of a skills hierarchy. The skills hierarchy comprises an organization of a standardized list of skills into a hierarchy that describes category relationships between the skills in the hierarchy. The category relationships may include no relationships, parent relationships, and child relationships. A skill may be considered a parent of another skill if the parent skill describes a broader category of skill that includes the child. Other relationships such as grandparent (e.g., a parent's parent), great-grandparent, grandchild, great grandchild and so on may be defined inferentially as well. In some examples, the constructed hierarchy may be organized with broader skills at higher levels and narrower skills at lower levels.
Other inventorsSee patent
Courses
-
Machine Learning
6.867
-
Neural Networks
9.641J
-
Real Analysis
18.100B
Projects
-
Skill & Expertise Endorsements
Interface design incorporating social proof and a light weight endorsement action to Profile Skills. This feature leveraged earlier work on Profile Guided Editing, and used the same guided UI to suggest skill endorsements to profile viewers. Recipients of the endorsement receive an email and on-site notification, with a landing experience that suggests they endorse people they know - creating a feel-good viral loop.
Other creatorsSee project -
DataFu
DataFu is a collection of user-defined functions for working with large-scale data in Hadoop and Pig. This library was born out of the need for a stable, well-tested library of UDFs for data mining and statistics. It is used at LinkedIn in many of our off-line workflows for data derived products like “People You May Know” and “Skills”
Other creatorsSee project -
LinkedIn Skills
- Present
LinkedIn Skills & Expertise is a set of tens of thousands of topic pages automatically constructed from LinkedIn profiles and external data sources. Using a variety of signals, we identify the most relevant people, places, and companies for each topic, track trends, and suggest skills users may want to add to their profiles.
Other creatorsSee project -
Veterans Hackday 2011
-
Organized LinkedIn's first Veterans Hackday in conjunction with the White House to encourage hackers all over the country to build projects that benefit veterans. We had 44 projects submitted from around the country, 11 awesome finalists, and the celebrity judges (Tim O'Reilly, Sumit Agarwal, Jeff Weiner, Chris Vein) picked 3 amazing winners.
Other creatorsSee project
Honors & Awards
-
Westinghouse Science Talent Search Semi-Finalist
Westinghouse
Selected as one of 300 semi-finalists nationwide for excellence in science and engineering. The Westinghouse Science Talent Search is the nation’s oldest and most prestigious science and math competition for high school seniors.
-
Intel International Science and Engineering Fair (ISEF), 2nd Place Grand Prize - Biochemistry
Intel International Science and Engineering Fair (ISEF)
Four time Intel International Science and Engineering Fair (ISEF) Finalist from 1993-1996, including a 3rd place in Physics and 2nd place in Biochemistry.
The Intel ISEF is the world’s largest international pre-college science competition representing 1,800 competitors from over 80 nations. -
Naval National Science Award Winner
Office of Naval Research
Languages
-
Spanish
-
Recommendations received
8 people have recommended Pete
Join now to viewMore activity by Pete
-
Had a great time recording this Latent Space episode with Alessio Fanelli, our lead investor at Decibel Partners. We are solving very open-ended…
Had a great time recording this Latent Space episode with Alessio Fanelli, our lead investor at Decibel Partners. We are solving very open-ended…
Liked by Pete Skomoroch
-
Excited to share that Fiddler AI is now helping the US Navy with AI Observability for Underwater Target Threat Detection in one of our largest…
Excited to share that Fiddler AI is now helping the US Navy with AI Observability for Underwater Target Threat Detection in one of our largest…
Liked by Pete Skomoroch
-
Has anyone complaining about California SB-1047 read the actual bill itself? Read this, and tell me what you could possibly disagree with here:
Has anyone complaining about California SB-1047 read the actual bill itself? Read this, and tell me what you could possibly disagree with here:
Shared by Pete Skomoroch
-
This is the clearest, most concise summary I’ve seen of California Senate Bill 1047. SB 1047 is proposed legislation which seeks to regulate the…
This is the clearest, most concise summary I’ve seen of California Senate Bill 1047. SB 1047 is proposed legislation which seeks to regulate the…
Shared by Pete Skomoroch
-
Databricks acquiring Tabular (for $1B) and OpenAI acquiring Rockset (for 9 figures in stock) both point to consolidation and an exit path for data/AI…
Databricks acquiring Tabular (for $1B) and OpenAI acquiring Rockset (for 9 figures in stock) both point to consolidation and an exit path for data/AI…
Liked by Pete Skomoroch
-
I’m happy to share that I’m starting a new position as Associate Product Manager Intern at Coinbase!
I’m happy to share that I’m starting a new position as Associate Product Manager Intern at Coinbase!
Liked by Pete Skomoroch
-
Databricks announced yesterday that Anomalo is their Emerging Partner of the Year!!! In our press release, Tim Ng, the data products engineering…
Databricks announced yesterday that Anomalo is their Emerging Partner of the Year!!! In our press release, Tim Ng, the data products engineering…
Liked by Pete Skomoroch
-
Applying AI & LLMs to financial research is an exciting opportunity right now and Mike Conover is the right founder to do it:
Applying AI & LLMs to financial research is an exciting opportunity right now and Mike Conover is the right founder to do it:
Shared by Pete Skomoroch
-
Did a fund raising workshop with a group of founders last week. Some of my messages were: 1/ Your job in that first mtg is to get the VC…
Did a fund raising workshop with a group of founders last week. Some of my messages were: 1/ Your job in that first mtg is to get the VC…
Liked by Pete Skomoroch
-
I’m super excited to announce that I’m joining OpenAI as Chief Product Officer! My entire career has been about working on big missions: connecting…
I’m super excited to announce that I’m joining OpenAI as Chief Product Officer! My entire career has been about working on big missions: connecting…
Liked by Pete Skomoroch
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore MoreOthers named Pete Skomoroch
1 other named Pete Skomoroch is on LinkedIn
See others named Pete Skomoroch