Pete Skomoroch

San Francisco, California, United States Contact Info
5K followers 500+ connections

Join to view profile

About

I'm a senior executive with extensive experience building and running data science, AI, &…

Articles by Pete

Activity

Join now to see all activity

Experience & Education

  • Independent

View Pete’s full experience

See their title, tenure and more.

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Licenses & Certifications

Publications

  • Pangloss: Fast Entity Linking in Noisy Text Environments

    KDD '18 Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

    Entity linking is the task of mapping potentially ambiguous terms in text to their constituent entities in a knowledge base like Wikipedia. This is useful for organizing content, extracting structured data from textual documents, and in machine learning relevance applications like semantic search, knowledge graph construction, and question answering. Traditionally, this work has focused on text that has been well-formed, like news articles, but in common real world datasets such as messaging…

    Entity linking is the task of mapping potentially ambiguous terms in text to their constituent entities in a knowledge base like Wikipedia. This is useful for organizing content, extracting structured data from textual documents, and in machine learning relevance applications like semantic search, knowledge graph construction, and question answering. Traditionally, this work has focused on text that has been well-formed, like news articles, but in common real world datasets such as messaging, resumes, or short-form social media, non-grammatical, loosely-structured text adds a new dimension to this problem. This paper presents Pangloss, a production system for entity disambiguation on noisy text. Pangloss combines a probabilistic linear-time key phrase identification algorithm with a semantic similarity engine based on context-dependent document embeddings to achieve better than state-of-the-art results (>5% in F1) compared to other research or commercially available systems. In addition, Pangloss leverages a local embedded database with a tiered architecture to house its statistics and metadata, which allows rapid disambiguation in streaming contexts and on-device disambiguation in low-memory environments such as mobile phones.

    Other authors
    See publication
  • LinkedIn Skills: Large-Scale Topic Extraction and Inference

    RecSys Proceedings

    "Skills and Expertise" is a data-driven feature on LinkedIn, the world's largest professional online social network, which allows members to tag themselves with topics representing their areas of expertise. In this work, we present our experiences developing this large-scale topic extraction pipeline, which includes constructing a folksonomy of skills and expertise and implementing an inference and recommender system for skills. We also discuss a consequent set of applications, such as…

    "Skills and Expertise" is a data-driven feature on LinkedIn, the world's largest professional online social network, which allows members to tag themselves with topics representing their areas of expertise. In this work, we present our experiences developing this large-scale topic extraction pipeline, which includes constructing a folksonomy of skills and expertise and implementing an inference and recommender system for skills. We also discuss a consequent set of applications, such as Endorsements, which allows members to tag themselves with topics representing their areas of expertise and for their connections to provide social proof, via an "endorse" action, of that member's competence in that topic.

    Other authors
  • Large-Scale Hierarchical Topic Models

    NIPS Workshop on Big Learning - 2012

    In the past decade, a number of advances in topic modeling have produced sophisticated models that are capable of generating hierarchies of topics. One challenge for these models is scalability: they are incapable of working at the massive scale of millions of documents and hundreds of thousands of terms. We address this challenge with a technique that learns a hierarchy of topics by iteratively applying topic models and processing subtrees of the hierarchy in parallel. This approach has a…

    In the past decade, a number of advances in topic modeling have produced sophisticated models that are capable of generating hierarchies of topics. One challenge for these models is scalability: they are incapable of working at the massive scale of millions of documents and hundreds of thousands of terms. We address this challenge with a technique that learns a hierarchy of topics by iteratively applying topic models and processing subtrees of the hierarchy in parallel. This approach has a number of scalability advantages compared to existing techniques, and shows promising results in experiments assessing runtime and human evaluations of quality. We detail extensions to this approach that may further improve hierarchical topic modeling for large-scale applications.

    Other authors
    See publication
  • Multisensor data analysis and aerosol background characterization

    Proc. SPIE 6218, Chemical and Biological Sensing VII

    A portable and extensible multisensor testbed for long-term multi-point aerosol background data collections has been developed. The primary objective of the testbed is to support investigations related to the information fusion, machine-intelligence based CB decision support architectrure, now under development at MIT Lincoln Laboratory. This paper describes major design features of the testbed and concentrates on the analysis and the results of multiple indoor data collections. Specifically…

    A portable and extensible multisensor testbed for long-term multi-point aerosol background data collections has been developed. The primary objective of the testbed is to support investigations related to the information fusion, machine-intelligence based CB decision support architectrure, now under development at MIT Lincoln Laboratory. This paper describes major design features of the testbed and concentrates on the analysis and the results of multiple indoor data collections. Specifically, two deployments of the testbed for extensive indoor data collection campaigns are described. The indoor background characterization results are presented.

    Other authors
    See publication
  • Information fusion and uncertainty management for biological multisensor systems

    Proc. SPIE 5813, Multisensor, Multisource Information Fusion: Architectures, Algorithms, and Applications 2005

    This paper investigates methods of decision-making from uncertain and disparate data. The need for such methods arises in those sensing application areas in which multiple and diverse sensing modalities are available, but the information provided can be imprecise or only indirectly related to the effects to be discerned. Biological sensing for biodefense is an important instance of such applications. Information fusion in that context is the focus of a research program now underway at MIT…

    This paper investigates methods of decision-making from uncertain and disparate data. The need for such methods arises in those sensing application areas in which multiple and diverse sensing modalities are available, but the information provided can be imprecise or only indirectly related to the effects to be discerned. Biological sensing for biodefense is an important instance of such applications. Information fusion in that context is the focus of a research program now underway at MIT Lincoln Laboratory. The paper outlines a multi-level, multi-classifier recognition architecture developed within this program, and discusses its components. Information source uncertainty is quantified and exploited for improving the quality of data that constitute the input to the classification processes. Several methods of sensor uncertainty exploitation at the feature-level are proposed and their efficacy is investigated. Other aspects of the program are discussed as well. While the primary focus of the paper is on biodefense, the applicability of concepts and techniques presented here extends to other multisensor fusion application domains.

    Other authors
    See publication

Patents

  • Skill Extraction System

    Issued US US8650177

    Machine automated method of identifying a set of skills

    Other inventors
    See patent
  • Skills ontology creation

    Filed US US9697472B2

    Disclosed in some examples are systems, methods, and machine readable mediums which allow for the automatic creation of a skills hierarchy. The skills hierarchy comprises an organization of a standardized list of skills into a hierarchy that describes category relationships between the skills in the hierarchy. The category relationships may include no relationships, parent relationships, and child relationships. A skill may be considered a parent of another skill if the parent skill describes a…

    Disclosed in some examples are systems, methods, and machine readable mediums which allow for the automatic creation of a skills hierarchy. The skills hierarchy comprises an organization of a standardized list of skills into a hierarchy that describes category relationships between the skills in the hierarchy. The category relationships may include no relationships, parent relationships, and child relationships. A skill may be considered a parent of another skill if the parent skill describes a broader category of skill that includes the child. Other relationships such as grandparent (e.g., a parent's parent), great-grandparent, grandchild, great grandchild and so on may be defined inferentially as well. In some examples, the constructed hierarchy may be organized with broader skills at higher levels and narrower skills at lower levels.

    Other inventors
    See patent
  • Methods & Systems for Recommending Decision Makers in an Organization

    Filed US 3080.132PRV

    Other inventors
  • Skills Endorsements

    Filed US 13/672,377

    Other inventors
  • Inferring and Suggesting Attribute Values For a Social Network Service

    Filed US 13/629,241

    Other inventors
  • Skill Ranking System

    Filed US 13/357,302

    Other inventors
  • Skill Customization System

    Filed US 13/357,360

    Other inventors
  • Inferred Identity

    US 14/292,779

    Other inventors
  • Methods and Systems for Exploring Career Options

    US US20120226623 A1

    Other inventors

Courses

  • Machine Learning

    6.867

  • Neural Networks

    9.641J

  • Real Analysis

    18.100B

Projects

  • Skill & Expertise Endorsements

    Interface design incorporating social proof and a light weight endorsement action to Profile Skills. This feature leveraged earlier work on Profile Guided Editing, and used the same guided UI to suggest skill endorsements to profile viewers. Recipients of the endorsement receive an email and on-site notification, with a landing experience that suggests they endorse people they know - creating a feel-good viral loop.

    Other creators
    See project
  • DataFu

    DataFu is a collection of user-defined functions for working with large-scale data in Hadoop and Pig. This library was born out of the need for a stable, well-tested library of UDFs for data mining and statistics. It is used at LinkedIn in many of our off-line workflows for data derived products like “People You May Know” and “Skills”

    Other creators
    See project
  • LinkedIn Skills

    - Present

    LinkedIn Skills & Expertise is a set of tens of thousands of topic pages automatically constructed from LinkedIn profiles and external data sources. Using a variety of signals, we identify the most relevant people, places, and companies for each topic, track trends, and suggest skills users may want to add to their profiles.

    Other creators
    See project
  • Veterans Hackday 2011

    -

    Organized LinkedIn's first Veterans Hackday in conjunction with the White House to encourage hackers all over the country to build projects that benefit veterans. We had 44 projects submitted from around the country, 11 awesome finalists, and the celebrity judges (Tim O'Reilly, Sumit Agarwal, Jeff Weiner, Chris Vein) picked 3 amazing winners.

    Other creators
    See project

Honors & Awards

  • Westinghouse Science Talent Search Semi-Finalist

    Westinghouse

    Selected as one of 300 semi-finalists nationwide for excellence in science and engineering. The Westinghouse Science Talent Search is the nation’s oldest and most prestigious science and math competition for high school seniors.

  • Intel International Science and Engineering Fair (ISEF), 2nd Place Grand Prize - Biochemistry

    Intel International Science and Engineering Fair (ISEF)

    Four time Intel International Science and Engineering Fair (ISEF) Finalist from 1993-1996, including a 3rd place in Physics and 2nd place in Biochemistry.

    The Intel ISEF is the world’s largest international pre-college science competition representing 1,800 competitors from over 80 nations.

  • Naval National Science Award Winner

    Office of Naval Research

Languages

  • Spanish

    -

Recommendations received

More activity by Pete

View Pete’s full profile

  • See who you know in common
  • Get introduced
  • Contact Pete directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Pete Skomoroch

Add new skills with these courses