Skip to main content

Showing 1–37 of 37 results for author: Faruqui, M

  1. arXiv:2407.10817  [pdf, other

    cs.CL cs.AI cs.LG

    Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation

    Authors: Tu Vu, Kalpesh Krishna, Salaheddin Alzubi, Chris Tar, Manaal Faruqui, Yun-Hsuan Sung

    Abstract: As large language models (LLMs) advance, it becomes more challenging to reliably evaluate their output due to the high costs of human evaluation. To make progress towards better LLM autoraters, we introduce FLAMe, a family of Foundational Large Autorater Models. FLAMe is trained on our large and diverse collection of 100+ quality assessment tasks comprising 5M+ human judgments, curated and standar… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 31 pages, 5 figures, 7 tables

  2. arXiv:2404.07143  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

    Authors: Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal

    Abstract: This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. A key component in our proposed approach is a new attention technique dubbed Infini-attention. The Infini-attention incorporates a compressive memory into the vanilla attention mechanism and builds in both masked local attention and long-te… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 9 pages, 4 figures, 4 tables

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  5. arXiv:2310.12963  [pdf, other

    cs.CL cs.AI

    AutoMix: Automatically Mixing Language Models

    Authors: Pranjal Aggarwal, Aman Madaan, Ankit Anand, Srividya Pranavi Potharaju, Swaroop Mishra, Pei Zhou, Aditya Gupta, Dheeraj Rajagopal, Karthik Kappaganthu, Yiming Yang, Shyam Upadhyay, Manaal Faruqui, Mausam

    Abstract: Large language models (LLMs) are now available from cloud API providers in various sizes and configurations. While this diversity offers a broad spectrum of choices, effectively leveraging the options to optimize computational cost and performance remains challenging. In this work, we present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness… ▽ More

    Submitted 28 June, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: The first two authors contributed equally. Work started and partly done during Aman's internship at Google. This version adds results on additional models and datasets

  6. arXiv:2310.03051  [pdf, other

    cs.CL cs.AI

    How FaR Are Large Language Models From Agents with Theory-of-Mind?

    Authors: Pei Zhou, Aman Madaan, Srividya Pranavi Potharaju, Aditya Gupta, Kevin R. McKee, Ari Holtzman, Jay Pujara, Xiang Ren, Swaroop Mishra, Aida Nematzadeh, Shyam Upadhyay, Manaal Faruqui

    Abstract: "Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those inferences. Existing question answering benchmarks such as ToMi ask models questions to make inferences about beliefs of characters in a story, but do not test whether models can then use these inferences to guide their action… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Preprint, 18 pages, 6 figures, 6 tables

  7. arXiv:2301.09244  [pdf, other

    cs.CL cs.AI

    Efficient Encoders for Streaming Sequence Tagging

    Authors: Ayush Kaushal, Aditya Gupta, Shyam Upadhyay, Manaal Faruqui

    Abstract: A naive application of state-of-the-art bidirectional encoders for streaming sequence tagging would require encoding each token from scratch for each new token in an incremental streaming input (like transcribed speech). The lack of re-usability of previous computation leads to a higher number of Floating Point Operations (or FLOPs) and higher number of unnecessary label flips. Increased FLOPs con… ▽ More

    Submitted 16 March, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

    Comments: EACL 2023 Camera-ready

  8. arXiv:2208.13322  [pdf, other

    cs.CL cs.SD eess.AS

    Streaming Intended Query Detection using E2E Modeling for Continued Conversation

    Authors: Shuo-yiin Chang, Guru Prakash, Zelin Wu, Qiao Liang, Tara N. Sainath, Bo Li, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman

    Abstract: In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query.However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations. Toavoid repeating a hotword, we propose a streaming end-to-end(E2E) intended query detector that identifies the utterancesdirected towards the device and filters… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

    Comments: 5 pages, Interspeech 2022

  9. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  10. arXiv:2112.05842  [pdf, other

    cs.CL cs.LG eess.AS

    Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems

    Authors: Manaal Faruqui, Dilek Hakkani-Tür

    Abstract: As more users across the world are interacting with dialog agents in their daily life, there is a need for better speech understanding that calls for renewed attention to the dynamics between research in automatic speech recognition (ASR) and natural language understanding (NLU). We briefly review these research areas and lay out the current relationship between them. In light of the observations… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: Accepted to be published at Computational Linguistics Journal 2022

  11. arXiv:2106.04571  [pdf, other

    cs.CL

    TIMEDIAL: Temporal Commonsense Reasoning in Dialog

    Authors: Lianhui Qin, Aditya Gupta, Shyam Upadhyay, Luheng He, Yejin Choi, Manaal Faruqui

    Abstract: Everyday conversations require understanding everyday events, which in turn, requires understanding temporal commonsense concepts interwoven with those events. Despite recent progress with massive pre-trained language models (LMs) such as T5 and GPT-3, their capability of temporal reasoning in dialogs remains largely under-explored. In this paper, we present the first study to investigate pre-trai… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

  12. arXiv:2106.04016  [pdf, other

    cs.CL

    Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

    Authors: Aditya Gupta, Jiacheng Xu, Shyam Upadhyay, Diyi Yang, Manaal Faruqui

    Abstract: Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disflue… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: Findings of ACL 2021

  13. arXiv:2004.14373  [pdf, other

    cs.CL cs.LG

    ToTTo: A Controlled Table-To-Text Generation Dataset

    Authors: Ankur P. Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das

    Abstract: We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revis… ▽ More

    Submitted 6 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Accepted to EMNLP 2020

  14. arXiv:1911.09247  [pdf, ps, other

    cs.CL

    How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

    Authors: Zewei Chu, Mingda Chen, Jing Chen, Miaosen Wang, Kevin Gimpel, Manaal Faruqui, Xiance Si

    Abstract: We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting MQR dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate.… ▽ More

    Submitted 20 November, 2019; originally announced November 2019.

    Comments: AAAI 2020

  15. arXiv:1909.11218  [pdf, other

    cs.CL cs.LG

    Attention Interpretability Across NLP Tasks

    Authors: Shikhar Vashishth, Shyam Upadhyay, Gaurav Singh Tomar, Manaal Faruqui

    Abstract: The attention layer in a neural network model provides insights into the model's reasoning behind its prediction, which are usually criticized for being opaque. Recently, seemingly contradictory viewpoints have emerged about the interpretability of attention weights (Jain & Wallace, 2019; Vig & Belinkov, 2019). Amid such confusion arises the need to understand attention mechanism more systematical… ▽ More

    Submitted 24 September, 2019; originally announced September 2019.

    Report number: 2019

  16. arXiv:1906.01081  [pdf, other

    cs.CL

    Handling Divergent Reference Texts when Evaluating Table-to-Text Generation

    Authors: Bhuwan Dhingra, Manaal Faruqui, Ankur Parikh, Ming-Wei Chang, Dipanjan Das, William W. Cohen

    Abstract: Automatically constructed datasets for generating text from semi-structured data (tables), such as WikiBio, often contain reference texts that diverge from the information in the corresponding semi-structured data. We show that metrics which rely solely on the reference texts, such as BLEU and ROUGE, show poor correlation with human judgments when those references diverge. We propose a new metric,… ▽ More

    Submitted 3 June, 2019; originally announced June 2019.

    Comments: To appear at ACL 2019

  17. arXiv:1904.04428  [pdf, other

    cs.CL

    Text Generation with Exemplar-based Adaptive Decoding

    Authors: Hao Peng, Ankur P. Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das

    Abstract: We propose a novel conditioned text generation model. It draws inspiration from traditional template-based text generation techniques, where the source provides the content (i.e., what to say), and the template influences how to say it. Building on the successful encoder-decoder paradigm, it first encodes the content representation from the given input text; to produce the output, it retrieves exe… ▽ More

    Submitted 10 April, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: NAACL 2019

  18. arXiv:1810.11101  [pdf, other

    cs.CL

    UniMorph 2.0: Universal Morphology

    Authors: Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. The project releases annotated morphological data using a universal tagset, the UniMorph schema. Each inflected form is associated with a lemma, which typically carries its underlying lexical meaning, and a bundle of morphological features from our schema.… ▽ More

    Submitted 25 February, 2020; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: LREC 2018

  19. arXiv:1808.09468  [pdf, ps, other

    cs.CL

    Learning To Split and Rephrase From Wikipedia Edit History

    Authors: Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das

    Abstract: Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia's edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Journal ref: Proc. of EMNLP 2018

  20. arXiv:1808.09422  [pdf, other

    cs.CL

    WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse

    Authors: Manaal Faruqui, Ellie Pavlick, Ian Tenney, Dipanjan Das

    Abstract: We release a corpus of 43 million atomic edits across 8 languages. These edits are mined from Wikipedia edit history and consist of instances in which a human editor has inserted a single contiguous phrase into, or deleted a single contiguous phrase from, an existing sentence. We use the collected data to show that the language generated during editing differs from the language that we observe in… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Journal ref: Proc. of EMNLP 2018

  21. arXiv:1808.09419  [pdf, other

    cs.CL

    Identifying Well-formed Natural Language Questions

    Authors: Manaal Faruqui, Dipanjan Das

    Abstract: Understanding search queries is a hard problem as it involves dealing with "word salad" text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding.… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

    Journal ref: Proc. of EMNLP 2018

  22. arXiv:1706.09031  [pdf, other

    cs.CL

    CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages

    Authors: Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The CoNLL-SIGMORPHON 2017 shared task on supervised morphological generation required systems to be trained and tested in each of 52 typologically diverse languages. In sub-task 1, submitted systems were asked to predict a specific inflected form of a given lemma. In sub-task 2, systems were given a lemma and some of its specific inflected forms, and asked to complete the inflectional paradigm by… ▽ More

    Submitted 4 July, 2017; v1 submitted 27 June, 2017; originally announced June 2017.

    Comments: CoNLL 2017

  23. arXiv:1701.03980  [pdf, other

    stat.ML cs.CL cs.MS

    DyNet: The Dynamic Neural Network Toolkit

    Authors: Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

    Abstract: We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its deriva… ▽ More

    Submitted 14 January, 2017; originally announced January 2017.

    Comments: 33 pages

  24. arXiv:1606.06710  [pdf, ps, other

    cs.CL

    Correlation-based Intrinsic Evaluation of Word Vector Representations

    Authors: Yulia Tsvetkov, Manaal Faruqui, Chris Dyer

    Abstract: We introduce QVEC-CCA--an intrinsic evaluation metric for word vector representations based on correlations of learned vectors with features extracted from linguistic resources. We show that QVEC-CCA scores are an effective proxy for a range of extrinsic semantic and syntactic tasks. We also show that the proposed evaluation obtains higher and more consistent correlations with downstream tasks, co… ▽ More

    Submitted 21 June, 2016; originally announced June 2016.

    Comments: RepEval 2016, 5 pages

  25. arXiv:1605.03852  [pdf, other

    cs.CL

    Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning

    Authors: Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Brian MacWhinney, Chris Dyer

    Abstract: We use Bayesian optimization to learn curricula for word representation learning, optimizing performance on downstream tasks that depend on the learned representations as features. The curricula are modeled by a linear ranking function which is the scalar product of a learned weight vector and an engineered feature vector that characterizes the different aspects of the complexity of each instance… ▽ More

    Submitted 21 June, 2016; v1 submitted 12 May, 2016; originally announced May 2016.

    Comments: In proceedings of ACL 2016, 10 pages

  26. arXiv:1605.03832  [pdf, other

    cs.CL

    Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

    Authors: Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W Black, Lori Levin, Chris Dyer

    Abstract: We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted. We apply these to the problem of modeling phone sequences---a domain in which universal symbol inventories and cross-linguistically shared featur… ▽ More

    Submitted 12 May, 2016; originally announced May 2016.

    Comments: Proceedings of NAACL 2016; 10 pages

  27. arXiv:1605.02276  [pdf, ps, other

    cs.CL

    Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

    Authors: Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer

    Abstract: Lacking standardized extrinsic evaluation methods for vector representations of words, the NLP community has relied heavily on word similarity tasks as a proxy for intrinsic evaluation of word vectors. Word similarity evaluation, which correlates the distance between vectors and human judgments of semantic similarity is attractive, because it is computationally inexpensive and fast. In this paper… ▽ More

    Submitted 21 June, 2016; v1 submitted 8 May, 2016; originally announced May 2016.

    Comments: The First Workshop on Evaluating Vector Space Representations for NLP

  28. arXiv:1604.00425  [pdf, other

    cs.CL

    Cross-lingual Models of Word Embeddings: An Empirical Comparison

    Authors: Shyam Upadhyay, Manaal Faruqui, Chris Dyer, Dan Roth

    Abstract: Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typographically different language pairs. Our evaluation setup spans… ▽ More

    Submitted 7 June, 2016; v1 submitted 1 April, 2016; originally announced April 2016.

    Comments: To appear at ACL 2016

  29. arXiv:1512.06110  [pdf, other

    cs.CL

    Morphological Inflection Generation Using Character Sequence to Sequence Learning

    Authors: Manaal Faruqui, Yulia Tsvetkov, Graham Neubig, Chris Dyer

    Abstract: Morphological inflection generation is the task of generating the inflected form of a given lemma corresponding to a particular linguistic transformation. We model the problem of inflection generation as a character sequence to sequence learning problem and present a variant of the neural encoder-decoder model for solving it. Our model is language independent and can be trained in both supervised… ▽ More

    Submitted 21 March, 2016; v1 submitted 18 December, 2015; originally announced December 2015.

    Comments: Proceedings of NAACL 2016

  30. arXiv:1512.05030  [pdf, other

    cs.CL

    Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

    Authors: Manaal Faruqui, Ryan McDonald, Radu Soricut

    Abstract: Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language. Such lexicons are not available for all languages and even when available, their coverage can be limited. We present a graph-based semi-supervised learning method that uses the morphological, syntactic and semantic relations between words to automatically construct wide coverage lexico… ▽ More

    Submitted 23 January, 2016; v1 submitted 15 December, 2015; originally announced December 2015.

    Comments: Transactions of the Association for Computational Linguistics (TACL) 2016

  31. arXiv:1506.05230  [pdf, ps, other

    cs.CL

    Non-distributional Word Vector Representations

    Authors: Manaal Faruqui, Chris Dyer

    Abstract: Data-driven representation learning for words is a technique of central importance in NLP. While indisputably useful as a source of features in downstream tasks, such vectors tend to consist of uninterpretable components whose relationship to the categories of traditional lexical semantic theories is tenuous at best. We present a method for constructing interpretable word vectors from hand-crafted… ▽ More

    Submitted 17 June, 2015; originally announced June 2015.

    Comments: Proceedings of ACL 2015

  32. arXiv:1506.02004  [pdf, other

    cs.CL

    Sparse Overcomplete Word Vector Representations

    Authors: Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, Noah Smith

    Abstract: Current distributed representations of words show little resemblance to theories of lexical semantics. The former are dense and uninterpretable, the latter largely based on familiar, discrete classes (e.g., supersenses) and relations (e.g., synonymy and hypernymy). We propose methods that transform word vectors into sparse (and optionally binary) vectors. The resulting representations are more sim… ▽ More

    Submitted 5 June, 2015; originally announced June 2015.

    Comments: Proceedings of ACL 2015

  33. arXiv:1503.06450  [pdf, other

    cs.CL

    Multilingual Open Relation Extraction Using Cross-lingual Projection

    Authors: Manaal Faruqui, Shankar Kumar

    Abstract: Open domain relation extraction systems identify relation and argument phrases in a sentence without relying on any underlying schema. However, current state-of-the-art relation extraction systems are available only for English because of their heavy reliance on linguistic tools such as part-of-speech taggers and dependency parsers. We present a cross-lingual annotation projection method for langu… ▽ More

    Submitted 14 April, 2021; v1 submitted 22 March, 2015; originally announced March 2015.

    Comments: Proceedings of NAACL 2015

  34. arXiv:1411.4166  [pdf, other

    cs.CL

    Retrofitting Word Vectors to Semantic Lexicons

    Authors: Manaal Faruqui, Jesse Dodge, Sujay K. Jauhar, Chris Dyer, Eduard Hovy, Noah A. Smith

    Abstract: Vector space word representations are learned from distributional information of words in large corpora. Although such statistics are semantically informative, they disregard the valuable information that is contained in semantic lexicons such as WordNet, FrameNet, and the Paraphrase Database. This paper proposes a method for refining vector space representations using relational information from… ▽ More

    Submitted 22 March, 2015; v1 submitted 15 November, 2014; originally announced November 2014.

    Comments: Proceedings of NAACL 2015

  35. arXiv:1406.2035  [pdf, other

    cs.CL cs.LG stat.ML

    Learning Word Representations with Hierarchical Sparse Coding

    Authors: Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah A. Smith

    Abstract: We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings. We show an efficient learning algorithm based on stochastic proximal methods that is significantly faster than previous approaches, making it possible to perform hierarchical sparse coding on a corpus of billions of word tokens. Experiments… ▽ More

    Submitted 6 November, 2014; v1 submitted 8 June, 2014; originally announced June 2014.

  36. arXiv:1405.0701  [pdf, other

    cs.CL

    "Translation can't change a name": Using Multilingual Data for Named Entity Recognition

    Authors: Manaal Faruqui

    Abstract: Named Entities (NEs) are often written with no orthographic changes across different languages that share a common alphabet. We show that this can be leveraged so as to improve named entity recognition (NER) by using unsupervised word clusters from secondary languages as features in state-of-the-art discriminative NER systems. We observe significant increases in performance, finding that person an… ▽ More

    Submitted 4 May, 2014; originally announced May 2014.

  37. arXiv:1306.2091  [pdf, other

    cs.CL

    A framework for (under)specifying dependency syntax without overloading annotators

    Authors: Nathan Schneider, Brendan O'Connor, Naomi Saphra, David Bamman, Manaal Faruqui, Noah A. Smith, Chris Dyer, Jason Baldridge

    Abstract: We introduce a framework for lightweight dependency syntax annotation. Our formalism builds upon the typical representation for unlabeled dependencies, permitting a simple notation and annotation workflow. Moreover, the formalism encourages annotators to underspecify parts of the syntax if doing so would streamline the annotation process. We demonstrate the efficacy of this annotation on three lan… ▽ More

    Submitted 14 June, 2013; v1 submitted 9 June, 2013; originally announced June 2013.

    Comments: This is an expanded version of a paper appearing in Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, Sofia, Bulgaria, August 8-9, 2013