-
What Causes the Failure of Explicit to Implicit Discourse Relation Recognition?
Authors:
Wei Liu,
Stephen Wan,
Michael Strube
Abstract:
We consider an unanswered question in the discourse processing community: why do relation classifiers trained on explicit examples (with connectives removed) perform poorly in real implicit scenarios? Prior work claimed this is due to linguistic dissimilarity between explicit and implicit examples but provided no empirical evidence. In this study, we show that one cause for such failure is a label…
▽ More
We consider an unanswered question in the discourse processing community: why do relation classifiers trained on explicit examples (with connectives removed) perform poorly in real implicit scenarios? Prior work claimed this is due to linguistic dissimilarity between explicit and implicit examples but provided no empirical evidence. In this study, we show that one cause for such failure is a label shift after connectives are eliminated. Specifically, we find that the discourse relations expressed by some explicit instances will change when connectives disappear. Unlike previous work manually analyzing a few examples, we present empirical evidence at the corpus level to prove the existence of such shift. Then, we analyze why label shift occurs by considering factors such as the syntactic role played by connectives, ambiguity of connectives, and more. Finally, we investigate two strategies to mitigate the label shift: filtering out noisy data and joint learning with connectives. Experiments on PDTB 2.0, PDTB 3.0, and the GUM dataset demonstrate that classifiers trained with our strategies outperform strong baselines.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Graph-based Clustering for Detecting Semantic Change Across Time and Languages
Authors:
Xianghe Ma,
Michael Strube,
Wei Zhao
Abstract:
Despite the predominance of contextualized embeddings in NLP, approaches to detect semantic change relying on these embeddings and clustering methods underperform simpler counterparts based on static word embeddings. This stems from the poor quality of the clustering methods to produce sense clusters -- which struggle to capture word senses, especially those with low frequency. This issue hinders…
▽ More
Despite the predominance of contextualized embeddings in NLP, approaches to detect semantic change relying on these embeddings and clustering methods underperform simpler counterparts based on static word embeddings. This stems from the poor quality of the clustering methods to produce sense clusters -- which struggle to capture word senses, especially those with low frequency. This issue hinders the next step in examining how changes in word senses in one language influence another. To address this issue, we propose a graph-based clustering approach to capture nuanced changes in both high- and low-frequency word senses across time and languages, including the acquisition and loss of these senses over time. Our experimental results show that our approach substantially surpasses previous approaches in the SemEval2020 binary classification task across four languages. Moreover, we showcase the ability of our approach as a versatile visualization tool to detect semantic changes in both intra-language and inter-language setups. We make our code and data publicly available.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Normed Spaces for Graph Embedding
Authors:
Diaaeldin Taha,
Wei Zhao,
J. Maxwell Riestenberg,
Michael Strube
Abstract:
Theoretical results from discrete geometry suggest that normed spaces can abstractly embed finite metric spaces with surprisingly low theoretical bounds on distortion in low dimensions. In this paper, inspired by this theoretical insight, we highlight normed spaces as a more flexible and computationally efficient alternative to several popular Riemannian manifolds for learning graph embeddings. No…
▽ More
Theoretical results from discrete geometry suggest that normed spaces can abstractly embed finite metric spaces with surprisingly low theoretical bounds on distortion in low dimensions. In this paper, inspired by this theoretical insight, we highlight normed spaces as a more flexible and computationally efficient alternative to several popular Riemannian manifolds for learning graph embeddings. Normed space embeddings significantly outperform several popular manifolds on a large range of synthetic and real-world graph reconstruction benchmark datasets while requiring significantly fewer computational resources. We also empirically verify the superiority of normed space embeddings on growing families of graphs associated with negative, zero, and positive curvature, further reinforcing the flexibility of normed spaces in capturing diverse graph structures as graph sizes increase. Lastly, we demonstrate the utility of normed space embeddings on two applied graph embedding tasks, namely, link prediction and recommender systems. Our work highlights the potential of normed spaces for geometric graph representation learning, raises new research questions, and offers a valuable tool for experimental mathematics in the field of finite metric space embeddings. We make our code and data publically available.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Investigating Multilingual Coreference Resolution by Universal Annotations
Authors:
Haixia Chai,
Michael Strube
Abstract:
Multilingual coreference resolution (MCR) has been a long-standing and challenging task. With the newly proposed multilingual coreference dataset, CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by using its harmonized universal morphosyntactic and coreference annotations. First, we study coreference by examining the ground truth data at different linguistic levels, na…
▽ More
Multilingual coreference resolution (MCR) has been a long-standing and challenging task. With the newly proposed multilingual coreference dataset, CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by using its harmonized universal morphosyntactic and coreference annotations. First, we study coreference by examining the ground truth data at different linguistic levels, namely mention, entity and document levels, and across different genres, to gain insights into the characteristics of coreference across multiple languages. Second, we perform an error analysis of the most challenging cases that the SotA system fails to resolve in the CRAC 2022 shared task using the universal annotations. Last, based on this analysis, we extract features from universal morphosyntactic annotations and integrate these features into a baseline system to assess their potential benefits for the MCR task. Our results show that our best configuration of features improves the baseline by 0.9% F1 score.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Modeling Graphs Beyond Hyperbolic: Graph Neural Networks in Symmetric Positive Definite Matrices
Authors:
Wei Zhao,
Federico Lopez,
J. Maxwell Riestenberg,
Michael Strube,
Diaaeldin Taha,
Steve Trettel
Abstract:
Recent research has shown that alignment between the structure of graph data and the geometry of an embedding space is crucial for learning high-quality representations of the data. The uniform geometry of Euclidean and hyperbolic spaces allows for representing graphs with uniform geometric and topological features, such as grids and hierarchies, with minimal distortion. However, real-world graph…
▽ More
Recent research has shown that alignment between the structure of graph data and the geometry of an embedding space is crucial for learning high-quality representations of the data. The uniform geometry of Euclidean and hyperbolic spaces allows for representing graphs with uniform geometric and topological features, such as grids and hierarchies, with minimal distortion. However, real-world graph data is characterized by multiple types of geometric and topological features, necessitating more sophisticated geometric embedding spaces. In this work, we utilize the Riemannian symmetric space of symmetric positive definite matrices (SPD) to construct graph neural networks that can robustly handle complex graphs. To do this, we develop an innovative library that leverages the SPD gyrocalculus tools \cite{lopez2021gyroSPD} to implement the building blocks of five popular graph neural networks in SPD. Experimental results demonstrate that our graph neural networks in SPD substantially outperform their counterparts in Euclidean and hyperbolic spaces, as well as the Cartesian product thereof, on complex graphs for node and graph classification tasks. We release the library and datasets at \url{https://github.com/andyweizhao/SPD4GNNs}.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation
Authors:
Wei Liu,
Michael Strube
Abstract:
Implicit discourse relation classification is a challenging task due to the absence of discourse connectives. To overcome this issue, we design an end-to-end neural model to explicitly generate discourse connectives for the task, inspired by the annotation process of PDTB. Specifically, our model jointly learns to generate discourse connectives between arguments and predict discourse relations bas…
▽ More
Implicit discourse relation classification is a challenging task due to the absence of discourse connectives. To overcome this issue, we design an end-to-end neural model to explicitly generate discourse connectives for the task, inspired by the annotation process of PDTB. Specifically, our model jointly learns to generate discourse connectives between arguments and predict discourse relations based on the arguments and the generated connectives. To prevent our relation classifier from being misled by poor connectives generated at the early stage of training while alleviating the discrepancy between training and inference, we adopt Scheduled Sampling to the joint learning. We evaluate our method on three benchmarks, PDTB 2.0, PDTB 3.0, and PCC. Results show that our joint model significantly outperforms various baselines on three datasets, demonstrating its superiority for the task.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
Modeling Structural Similarities between Documents for Coherence Assessment with Graph Convolutional Networks
Authors:
Wei Liu,
Xiyan Fu,
Michael Strube
Abstract:
Coherence is an important aspect of text quality, and various approaches have been applied to coherence modeling. However, existing methods solely focus on a single document's coherence patterns, ignoring the underlying correlation between documents. We investigate a GCN-based coherence model that is capable of capturing structural similarities between documents. Our model first creates a graph st…
▽ More
Coherence is an important aspect of text quality, and various approaches have been applied to coherence modeling. However, existing methods solely focus on a single document's coherence patterns, ignoring the underlying correlation between documents. We investigate a GCN-based coherence model that is capable of capturing structural similarities between documents. Our model first creates a graph structure for each document, from where we mine different subgraph patterns. We then construct a heterogeneous graph for the training corpus, connecting documents based on their shared subgraphs. Finally, a GCN is applied to the heterogeneous graph to model the connectivity relationships. We evaluate our method on two tasks, assessing discourse coherence and automated essay scoring. Results show that our GCN-based model outperforms all baselines, achieving a new state-of-the-art on both tasks.
△ Less
Submitted 10 June, 2023;
originally announced June 2023.
-
SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism
Authors:
Mehwish Fatima,
Tim Kolber,
Katja Markert,
Michael Strube
Abstract:
Cross-lingual science journalism generates popular science stories of scientific articles different from the source language for a non-expert audience. Hence, a cross-lingual popular summary must contain the salient content of the input document, and the content should be coherent, comprehensible, and in a local language for the targeted audience. We improve these aspects of cross-lingual summary…
▽ More
Cross-lingual science journalism generates popular science stories of scientific articles different from the source language for a non-expert audience. Hence, a cross-lingual popular summary must contain the salient content of the input document, and the content should be coherent, comprehensible, and in a local language for the targeted audience. We improve these aspects of cross-lingual summary generation by joint training of two high-level NLP tasks, simplification and cross-lingual summarization. The former task reduces linguistic complexity, and the latter focuses on cross-lingual abstractive summarization. We propose a novel multi-task architecture - SimCSum consisting of one shared encoder and two parallel decoders jointly learning simplification and cross-lingual summarization. We empirically investigate the performance of SimCSum by comparing it with several strong baselines over several evaluation metrics and by human evaluation. Overall, SimCSum demonstrates statistically significant improvements over the state-of-the-art on two non-synthetic cross-lingual scientific datasets. Furthermore, we conduct an in-depth investigation into the linguistic properties of generated summaries and an error analysis.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence
Authors:
Wei Zhao,
Michael Strube,
Steffen Eger
Abstract:
Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourse-level improvements of those text generation systems. In this work, we introduce DiscoScor…
▽ More
Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourse-level improvements of those text generation systems. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level -- which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at \url{https://github.com/AIPHES/DiscoScore}.
△ Less
Submitted 6 February, 2023; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Impact of Target Word and Context on End-to-End Metonymy Detection
Authors:
Kevin Alex Mathews,
Michael Strube
Abstract:
Metonymy is a figure of speech in which an entity is referred to by another related entity. The task of metonymy detection aims to distinguish metonymic tokens from literal ones. Until now, metonymy detection methods attempt to disambiguate only a single noun phrase in a sentence, typically location names or organization names. In this paper, we disambiguate every word in a sentence by reformulati…
▽ More
Metonymy is a figure of speech in which an entity is referred to by another related entity. The task of metonymy detection aims to distinguish metonymic tokens from literal ones. Until now, metonymy detection methods attempt to disambiguate only a single noun phrase in a sentence, typically location names or organization names. In this paper, we disambiguate every word in a sentence by reformulating metonymy detection as a sequence labeling task. We also investigate the impact of target word and context on metonymy detection. We show that the target word is less useful for detecting metonymy in our dataset. On the other hand, the entity types that are associated with domain-specific words in their context are easier to solve. This shows that the context words are much more relevant for detecting metonymy.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Vector-valued Distance and Gyrocalculus on the Space of Symmetric Positive Definite Matrices
Authors:
Federico López,
Beatrice Pozzetti,
Steve Trettel,
Michael Strube,
Anna Wienhard
Abstract:
We propose the use of the vector-valued distance to compute distances and extract geometric information from the manifold of symmetric positive definite matrices (SPD), and develop gyrovector calculus, constructing analogs of vector space operations in this curved space. We implement these operations and showcase their versatility in the tasks of knowledge graph completion, item recommendation, an…
▽ More
We propose the use of the vector-valued distance to compute distances and extract geometric information from the manifold of symmetric positive definite matrices (SPD), and develop gyrovector calculus, constructing analogs of vector space operations in this curved space. We implement these operations and showcase their versatility in the tasks of knowledge graph completion, item recommendation, and question answering. In experiments, the SPD models outperform their equivalents in Euclidean and hyperbolic space. The vector-valued distance allows us to visualize embeddings, showing that the models learn to disentangle representations of positive samples from negative ones.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Augmenting the User-Item Graph with Textual Similarity Models
Authors:
Federico López,
Martin Scholz,
Jessica Yung,
Marie Pellat,
Michael Strube,
Lucas Dixon
Abstract:
This paper introduces a simple and effective form of data augmentation for recommender systems. A paraphrase similarity model is applied to widely available textual data, such as reviews and product descriptions, yielding new semantic relations that are added to the user-item graph. This increases the density of the graph without needing further labeled data. The data augmentation is evaluated on…
▽ More
This paper introduces a simple and effective form of data augmentation for recommender systems. A paraphrase similarity model is applied to widely available textual data, such as reviews and product descriptions, yielding new semantic relations that are added to the user-item graph. This increases the density of the graph without needing further labeled data. The data augmentation is evaluated on a variety of recommendation algorithms, using Euclidean, hyperbolic, and complex spaces, and over three categories of Amazon product reviews with differing characteristics. Results show that the data augmentation technique provides significant improvements to all types of models, with the most pronounced gains for knowledge graph-based recommenders, particularly in cold-start settings, leading to state-of-the-art performance.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
Symmetric Spaces for Graph Embeddings: A Finsler-Riemannian Approach
Authors:
Federico López,
Beatrice Pozzetti,
Steve Trettel,
Michael Strube,
Anna Wienhard
Abstract:
Learning faithful graph representations as sets of vertex embeddings has become a fundamental intermediary step in a wide range of machine learning applications. We propose the systematic use of symmetric spaces in representation learning, a class encompassing many of the previously used embedding targets. This enables us to introduce a new method, the use of Finsler metrics integrated in a Rieman…
▽ More
Learning faithful graph representations as sets of vertex embeddings has become a fundamental intermediary step in a wide range of machine learning applications. We propose the systematic use of symmetric spaces in representation learning, a class encompassing many of the previously used embedding targets. This enables us to introduce a new method, the use of Finsler metrics integrated in a Riemannian optimization scheme, that better adapts to dissimilar structures in the graph. We develop a tool to analyze the embeddings and infer structural properties of the data sets. For implementation, we choose Siegel spaces, a versatile family of symmetric spaces. Our approach outperforms competitive baselines for graph reconstruction tasks on various synthetic and real-world datasets. We further demonstrate its applicability on two downstream tasks, recommender systems and node classification.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
A Fully Hyperbolic Neural Model for Hierarchical Multi-Class Classification
Authors:
Federico López,
Michael Strube
Abstract:
Label inventories for fine-grained entity typing have grown in size and complexity. Nonetheless, they exhibit a hierarchical structure. Hyperbolic spaces offer a mathematically appealing approach for learning hierarchical representations of symbolic data. However, it is not clear how to integrate hyperbolic components into downstream tasks. This is the first work that proposes a fully hyperbolic m…
▽ More
Label inventories for fine-grained entity typing have grown in size and complexity. Nonetheless, they exhibit a hierarchical structure. Hyperbolic spaces offer a mathematically appealing approach for learning hierarchical representations of symbolic data. However, it is not clear how to integrate hyperbolic components into downstream tasks. This is the first work that proposes a fully hyperbolic model for multi-class multi-label classification, which performs all operations in hyperbolic space. We evaluate the proposed model on two challenging datasets and compare to different baselines that operate under Euclidean assumptions. Our hyperbolic model infers the latent hierarchy from the class distribution, captures implicit hyponymic relations in the inventory, and shows performance on par with state-of-the-art methods on fine-grained classification with remarkable reduction of the parameter size. A thorough analysis sheds light on the impact of each component in the final prediction and showcases its ease of integration with Euclidean layers.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Adapting Deep Learning Methods for Mental Health Prediction on Social Media
Authors:
Ivan Sekulić,
Michael Strube
Abstract:
Mental health poses a significant challenge for an individual's well-being. Text analysis of rich resources, like social media, can contribute to deeper understanding of illnesses and provide means for their early detection. We tackle a challenge of detecting social media users' mental status through deep learning-based models, moving away from traditional approaches to the task. In a binary class…
▽ More
Mental health poses a significant challenge for an individual's well-being. Text analysis of rich resources, like social media, can contribute to deeper understanding of illnesses and provide means for their early detection. We tackle a challenge of detecting social media users' mental status through deep learning-based models, moving away from traditional approaches to the task. In a binary classification task on predicting if a user suffers from one of nine different disorders, a hierarchical attention network outperforms previously set benchmarks for four of the disorders. Furthermore, we explore the limitations of our model and analyze phrases relevant for classification by inspecting the model's word-level attention weights.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
On the Importance of Subword Information for Morphological Tasks in Truly Low-Resource Languages
Authors:
Yi Zhu,
Benjamin Heinzerling,
Ivan Vulić,
Michael Strube,
Roi Reichart,
Anna Korhonen
Abstract:
Recent work has validated the importance of subword information for word representation learning. Since subwords increase parameter sharing ability in neural models, their value should be even more pronounced in low-data regimes. In this work, we therefore provide a comprehensive analysis focused on the usefulness of subwords for word representation learning in truly low-resource scenarios and for…
▽ More
Recent work has validated the importance of subword information for word representation learning. Since subwords increase parameter sharing ability in neural models, their value should be even more pronounced in low-data regimes. In this work, we therefore provide a comprehensive analysis focused on the usefulness of subwords for word representation learning in truly low-resource scenarios and for three representative morphological tasks: fine-grained entity typing, morphological tagging, and named entity recognition. We conduct a systematic study that spans several dimensions of comparison: 1) type of data scarcity which can stem from the lack of task-specific training data, or even from the lack of unannotated data required to train word embeddings, or both; 2) language type by working with a sample of 16 typologically diverse languages including some truly low-resource ones (e.g. Rusyn, Buryat, and Zulu); 3) the choice of the subword-informed word representation method. Our main results show that subword-informed models are universally useful across all language types, with large gains over subword-agnostic embeddings. They also suggest that the effective use of subwords largely depends on the language (type) and the task at hand, as well as on the amount of available data for training the embeddings and task-based models, where having sufficient in-task data is a more critical requirement.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Using Automatically Extracted Minimum Spans to Disentangle Coreference Evaluation from Boundary Detection
Authors:
Nafise Sadat Moosavi,
Leo Born,
Massimo Poesio,
Michael Strube
Abstract:
The common practice in coreference resolution is to identify and evaluate the maximum span of mentions. The use of maximum spans tangles coreference evaluation with the challenges of mention boundary detection like prepositional phrase attachment. To address this problem, minimum spans are manually annotated in smaller corpora. However, this additional annotation is costly and therefore, this solu…
▽ More
The common practice in coreference resolution is to identify and evaluate the maximum span of mentions. The use of maximum spans tangles coreference evaluation with the challenges of mention boundary detection like prepositional phrase attachment. To address this problem, minimum spans are manually annotated in smaller corpora. However, this additional annotation is costly and therefore, this solution does not scale to large corpora. In this paper, we propose the MINA algorithm for automatically extracting minimum spans to benefit from minimum span evaluation in all corpora. We show that the extracted minimum spans by MINA are consistent with those that are manually annotated by experts. Our experiments show that using minimum spans is in particular important in cross-dataset coreference evaluation, in which detected mention boundaries are noisier due to domain shift. We will integrate MINA into https://github.com/ns-moosavi/coval for reporting standard coreference scores based on both maximum and automatically detected minimum spans.
△ Less
Submitted 16 June, 2019;
originally announced June 2019.
-
Fine-Grained Entity Typing in Hyperbolic Space
Authors:
Federico López,
Benjamin Heinzerling,
Michael Strube
Abstract:
How can we represent hierarchical information present in large type inventories for entity typing? We study the ability of hyperbolic embeddings to capture hierarchical relations between mentions in context and their target types in a shared vector space. We evaluate on two datasets and investigate two different techniques for creating a large hierarchical entity type inventory: from an expert-gen…
▽ More
How can we represent hierarchical information present in large type inventories for entity typing? We study the ability of hyperbolic embeddings to capture hierarchical relations between mentions in context and their target types in a shared vector space. We evaluate on two datasets and investigate two different techniques for creating a large hierarchical entity type inventory: from an expert-generated ontology and by automatically mining type co-occurrences. We find that the hyperbolic model yields improvements over its Euclidean counterpart in some, but not all cases. Our analysis suggests that the adequacy of this geometry depends on the granularity of the type inventory and the way hierarchical relations are inferred.
△ Less
Submitted 6 June, 2019;
originally announced June 2019.
-
Sequence Tagging with Contextual and Non-Contextual Subword Representations: A Multilingual Evaluation
Authors:
Benjamin Heinzerling,
Michael Strube
Abstract:
Pretrained contextual and non-contextual subword embeddings have become available in over 250 languages, allowing massively multilingual NLP. However, while there is no dearth of pretrained embeddings, the distinct lack of systematic evaluations makes it difficult for practitioners to choose between them. In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings,…
▽ More
Pretrained contextual and non-contextual subword embeddings have become available in over 250 languages, allowing massively multilingual NLP. However, while there is no dearth of pretrained embeddings, the distinct lack of systematic evaluations makes it difficult for practitioners to choose between them. In this work, we conduct an extensive evaluation comparing non-contextual subword embeddings, namely FastText and BPEmb, and a contextual representation method, namely BERT, on multilingual named entity recognition and part-of-speech tagging. We find that overall, a combination of BERT, BPEmb, and character representations works best across languages and tasks. A more detailed analysis reveals different strengths and weaknesses: Multilingual BERT performs well in medium- to high-resource languages, but is outperformed by non-contextual subword embeddings in a low-resource setting.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Transparent, Efficient, and Robust Word Embedding Access with WOMBAT
Authors:
Mark-Christoph Müller,
Michael Strube
Abstract:
We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory…
▽ More
We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.
△ Less
Submitted 2 July, 2018;
originally announced July 2018.
-
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
Authors:
Benjamin Heinzerling,
Michael Strube
Abstract:
We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages bet- ter than alternative subword approaches, while requiring vastly fewer resources and no tokenization. BPEmb is available at https://github.com/bheinzerling/bp…
▽ More
We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE). In an evaluation using fine-grained entity typing as testbed, BPEmb performs competitively, and for some languages bet- ter than alternative subword approaches, while requiring vastly fewer resources and no tokenization. BPEmb is available at https://github.com/bheinzerling/bpemb
△ Less
Submitted 5 October, 2017;
originally announced October 2017.
-
Using Linguistic Features to Improve the Generalization Capability of Neural Coreference Resolvers
Authors:
Nafise Sadat Moosavi,
Michael Strube
Abstract:
Coreference resolution is an intermediate step for text understanding. It is used in tasks and domains for which we do not necessarily have coreference annotated corpora. Therefore, generalization is of special importance for coreference resolution. However, while recent coreference resolvers have notable improvements on the CoNLL dataset, they struggle to generalize properly to new domains or dat…
▽ More
Coreference resolution is an intermediate step for text understanding. It is used in tasks and domains for which we do not necessarily have coreference annotated corpora. Therefore, generalization is of special importance for coreference resolution. However, while recent coreference resolvers have notable improvements on the CoNLL dataset, they struggle to generalize properly to new domains or datasets. In this paper, we investigate the role of linguistic features in building more generalizable coreference resolvers. We show that generalization improves only slightly by merely using a set of additional linguistic features. However, employing features and subsets of their values that are informative for coreference resolution, considerably improves generalization. Thanks to better generalization, our system achieves state-of-the-art results in out-of-domain evaluations, e.g., on WikiCoref, our system, which is trained on CoNLL, achieves on-par performance with a system designed for this dataset.
△ Less
Submitted 12 October, 2018; v1 submitted 1 August, 2017;
originally announced August 2017.
-
Revisiting Selectional Preferences for Coreference Resolution
Authors:
Benjamin Heinzerling,
Nafise Sadat Moosavi,
Michael Strube
Abstract:
Selectional preferences have long been claimed to be essential for coreference resolution. However, they are mainly modeled only implicitly by current coreference resolvers. We propose a dependency-based embedding model of selectional preferences which allows fine-grained compatibility judgments with high coverage. We show that the incorporation of our model improves coreference resolution perform…
▽ More
Selectional preferences have long been claimed to be essential for coreference resolution. However, they are mainly modeled only implicitly by current coreference resolvers. We propose a dependency-based embedding model of selectional preferences which allows fine-grained compatibility judgments with high coverage. We show that the incorporation of our model improves coreference resolution performance on the CoNLL dataset, matching the state-of-the-art results of a more complex system. However, it comes with a cost that makes it debatable how worthwhile such improvements are.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.
-
Lexical Features in Coreference Resolution: To be Used With Caution
Authors:
Nafise Sadat Moosavi,
Michael Strube
Abstract:
Lexical features are a major source of information in state-of-the-art coreference resolvers. Lexical features implicitly model some of the linguistic phenomena at a fine granularity level. They are especially useful for representing the context of mentions. In this paper we investigate a drawback of using many lexical features in state-of-the-art coreference resolvers. We show that if coreference…
▽ More
Lexical features are a major source of information in state-of-the-art coreference resolvers. Lexical features implicitly model some of the linguistic phenomena at a fine granularity level. They are especially useful for representing the context of mentions. In this paper we investigate a drawback of using many lexical features in state-of-the-art coreference resolvers. We show that if coreference resolvers mainly rely on lexical features, they can hardly generalize to unseen domains. Furthermore, we show that the current coreference resolution evaluation is clearly flawed by only evaluating on a specific split of a specific dataset in which there is a notable overlap between the training, development and test sets.
△ Less
Submitted 22 April, 2017;
originally announced April 2017.
-
Use Generalized Representations, But Do Not Forget Surface Features
Authors:
Nafise Sadat Moosavi,
Michael Strube
Abstract:
Only a year ago, all state-of-the-art coreference resolvers were using an extensive amount of surface features. Recently, there was a paradigm shift towards using word embeddings and deep neural networks, where the use of surface features is very limited. In this paper, we show that a simple SVM model with surface features outperforms more complex neural models for detecting anaphoric mentions. Ou…
▽ More
Only a year ago, all state-of-the-art coreference resolvers were using an extensive amount of surface features. Recently, there was a paradigm shift towards using word embeddings and deep neural networks, where the use of surface features is very limited. In this paper, we show that a simple SVM model with surface features outperforms more complex neural models for detecting anaphoric mentions. Our analysis suggests that using generalized representations and surface features have different strength that should be both taken into account for improving coreference resolution.
△ Less
Submitted 24 February, 2017;
originally announced February 2017.
-
Never Look Back: An Alternative to Centering
Authors:
Michael Strube
Abstract:
I propose a model for determining the hearer's attentional state which depends solely on a list of salient discourse entities (S-list). The ordering among the elements of the S-list covers also the function of the backward-looking center in the centering model. The ranking criteria for the S-list are based on the distinction between hearer-old and hearer-new discourse entities and incorporate pr…
▽ More
I propose a model for determining the hearer's attentional state which depends solely on a list of salient discourse entities (S-list). The ordering among the elements of the S-list covers also the function of the backward-looking center in the centering model. The ranking criteria for the S-list are based on the distinction between hearer-old and hearer-new discourse entities and incorporate preferences for inter- and intra-sentential anaphora. The model is the basis for an algorithm which operates incrementally, word by word.
△ Less
Submitted 25 June, 1998;
originally announced June 1998.
-
Centering in-the-large: Computing referential discourse segments
Authors:
Udo Hahn,
Michael Strube
Abstract:
We specify an algorithm that builds up a hierarchy of referential discourse segments from local centering data. The spatial extension and nesting of these discourse segments constrain the reachability of potential antecedents of an anaphoric expression beyond the local level of adjacent center pairs. Thus, the centering model is scaled up to the level of the global referential structure of disco…
▽ More
We specify an algorithm that builds up a hierarchy of referential discourse segments from local centering data. The spatial extension and nesting of these discourse segments constrain the reachability of potential antecedents of an anaphoric expression beyond the local level of adjacent center pairs. Thus, the centering model is scaled up to the level of the global referential structure of discourse. An empirical evaluation of the algorithm is supplied.
△ Less
Submitted 30 April, 1997;
originally announced April 1997.
-
Incremental Centering and Center Ambiguity
Authors:
Udo Hahn,
Michael Strube
Abstract:
In this paper, we present a model of anaphor resolution within the framework of the centering model. The consideration of an incremental processing mode introduces the need to manage structural ambiguity at the center level. Hence, the centering framework is further refined to account for local and global parsing ambiguities which propagate up to the level of center representations, yielding mod…
▽ More
In this paper, we present a model of anaphor resolution within the framework of the centering model. The consideration of an incremental processing mode introduces the need to manage structural ambiguity at the center level. Hence, the centering framework is further refined to account for local and global parsing ambiguities which propagate up to the level of center representations, yielding moderately adapted data structures for the centering algorithm.
△ Less
Submitted 16 May, 1996;
originally announced May 1996.
-
A Conceptual Reasoning Approach to Textual Ellipsis
Authors:
Udo Hahn,
Katja Markert,
Michael Strube
Abstract:
We present a hybrid text understanding methodology for the resolution of textual ellipsis. It integrates conceptual criteria (based on the well-formedness and conceptual strength of role chains in a terminological knowledge base) and functional constraints reflecting the utterances' information structure (based on the distinction between context-bound and unbound discourse elements). The methodo…
▽ More
We present a hybrid text understanding methodology for the resolution of textual ellipsis. It integrates conceptual criteria (based on the well-formedness and conceptual strength of role chains in a terminological knowledge base) and functional constraints reflecting the utterances' information structure (based on the distinction between context-bound and unbound discourse elements). The methodological framework for text ellipsis resolution is the centering model that has been adapted to these constraints.
△ Less
Submitted 15 May, 1996;
originally announced May 1996.
-
Processing Complex Sentences in the Centering Framework
Authors:
Michael Strube
Abstract:
We extend the centering model for the resolution of intra-sentential anaphora and specify how to handle complex sentences. An empirical evaluation indicates that the functional information structure guides the search for an antecedent within the sentence.
We extend the centering model for the resolution of intra-sentential anaphora and specify how to handle complex sentences. An empirical evaluation indicates that the functional information structure guides the search for an antecedent within the sentence.
△ Less
Submitted 14 May, 1996;
originally announced May 1996.
-
Functional Centering
Authors:
Michael Strube,
Udo Hahn
Abstract:
Based on empirical evidence from a free word order language (German) we propose a fundamental revision of the principles guiding the ordering of discourse entities in the forward-looking centers within the centering model. We claim that grammatical role criteria should be replaced by indicators of the functional information structure of the utterances, i.e., the distinction between context-bound…
▽ More
Based on empirical evidence from a free word order language (German) we propose a fundamental revision of the principles guiding the ordering of discourse entities in the forward-looking centers within the centering model. We claim that grammatical role criteria should be replaced by indicators of the functional information structure of the utterances, i.e., the distinction between context-bound and unbound discourse elements. This claim is backed up by an empirical evaluation of functional centering.
△ Less
Submitted 14 May, 1996;
originally announced May 1996.
-
ParseTalk about Textual Ellipsis
Authors:
Michael Strube,
Udo Hahn
Abstract:
A hybrid methodology for the resolution of text-level ellipsis is presented in this paper. It incorporates conceptual proximity criteria applied to ontologically well-engineered domain knowledge bases and an approach to centering based on functional topic/comment patterns. We state text grammatical predicates for ellipsis and then turn to the procedural aspects of their evaluation within the fra…
▽ More
A hybrid methodology for the resolution of text-level ellipsis is presented in this paper. It incorporates conceptual proximity criteria applied to ontologically well-engineered domain knowledge bases and an approach to centering based on functional topic/comment patterns. We state text grammatical predicates for ellipsis and then turn to the procedural aspects of their evaluation within the framework of an actor-based implementation of a lexically distributed parser.
△ Less
Submitted 28 September, 1995;
originally announced September 1995.
-
ParseTalk about Sentence- and Text-Level Anaphora
Authors:
Michael Strube,
Udo Hahn
Abstract:
We provide a unified account of sentence-level and text-level anaphora within the framework of a dependency-based grammar model. Criteria for anaphora resolution within sentence boundaries rephrase major concepts from GB's binding theory, while those for text-level anaphora incorporate an adapted version of a Grosz-Sidner-style focus model.
We provide a unified account of sentence-level and text-level anaphora within the framework of a dependency-based grammar model. Criteria for anaphora resolution within sentence boundaries rephrase major concepts from GB's binding theory, while those for text-level anaphora incorporate an adapted version of a Grosz-Sidner-style focus model.
△ Less
Submitted 3 March, 1995;
originally announced March 1995.