Skip to main content

Showing 1–46 of 46 results for author: Richardson, K

  1. arXiv:2406.04784  [pdf, other

    cs.CL cs.AI

    SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals

    Authors: Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yang

    Abstract: Language agents powered by large language models (LLMs) are increasingly valuable as decision-making tools in domains such as gaming and programming. However, these agents often face challenges in achieving high-level goals without detailed instructions and in adapting to environments where feedback is delayed. In this paper, we present SelfGoal, a novel automatic approach designed to enhance agen… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2402.05733  [pdf, other

    cs.CL

    TimeArena: Shaping Efficient Multitasking Language Agents in a Time-Aware Simulation

    Authors: Yikai Zhang, Siyu Yuan, Caiyu Hu, Kyle Richardson, Yanghua Xiao, Jiangjie Chen

    Abstract: Despite remarkable advancements in emulating human-like behavior through Large Language Models (LLMs), current textual simulations do not adequately address the notion of time. To this end, we introduce TimeArena, a novel textual simulated environment that incorporates complex temporal dynamics and constraints that better reflect real-life planning scenarios. In TimeArena, agents are asked to comp… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Work in progress

  3. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  4. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  5. arXiv:2312.10523  [pdf, other

    cs.CL cs.AI cs.LG

    Paloma: A Benchmark for Evaluating Language Model Fit

    Authors: Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

    Abstract: Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from nytimes.com… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Project Page: https://paloma.allen.ai/

  6. arXiv:2312.10253  [pdf, other

    cs.CL

    Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

    Authors: Dirk Groeneveld, Anas Awadalla, Iz Beltagy, Akshita Bhagia, Ian Magnusson, Hao Peng, Oyvind Tafjord, Pete Walsh, Kyle Richardson, Jesse Dodge

    Abstract: The success of large language models has shifted the evaluation paradigms in natural language processing (NLP). The community's interest has drifted towards comparing NLP models across many tasks, domains, and datasets, often at an extreme scale. This imposes new engineering challenges: efforts in constructing datasets and models have been fragmented, and their formats and interfaces are incompati… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: technical report, work in progress

  7. arXiv:2310.05746  [pdf, other

    cs.CL cs.AI

    Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

    Authors: Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle Richardson

    Abstract: Recent advancements in Large Language Models (LLMs) showcase advanced reasoning, yet NLP evaluations often depend on static benchmarks. Evaluating this necessitates environments that test strategic reasoning in dynamic, competitive scenarios requiring long-term planning. We introduce AucArena, a novel evaluation suite that simulates auctions, a setting chosen for being highly unpredictable and inv… ▽ More

    Submitted 2 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Preprint

  8. arXiv:2305.14250  [pdf, other

    cs.CL cs.AI

    Language Models with Rationality

    Authors: Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schuetze, Peter Clark

    Abstract: While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent "beliefs". This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that… ▽ More

    Submitted 29 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  9. arXiv:2212.10534  [pdf, other

    cs.CL

    DISCO: Distilling Counterfactuals with Large Language Models

    Authors: Zeming Chen, Qiyue Gao, Antoine Bosselut, Ashish Sabharwal, Kyle Richardson

    Abstract: Models trained with counterfactually augmented data learn representations of the causal structure of tasks, enabling robust generalization. However, high-quality counterfactual data is scarce for most tasks and not easily generated at scale. When crowdsourced, such data is typically limited in scale and diversity; when generated using supervised methods, it is computationally expensive to extend t… ▽ More

    Submitted 5 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023 camera ready, final title change

  10. arXiv:2211.07950  [pdf, other

    cs.CL

    Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs

    Authors: Kyle Richardson, Ronen Tamari, Oren Sultan, Reut Tsarfaty, Dafna Shahaf, Ashish Sabharwal

    Abstract: Can we teach natural language understanding models to track their beliefs through intermediate points in text? We propose a representation learning framework called breakpoint modeling that allows for learning of this type. Given any text encoder and data marked with intermediate states (breakpoints) along with corresponding textual queries viewed as true/false propositions (i.e., the candidate be… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022

  11. arXiv:2210.16865  [pdf, other

    cs.CL

    Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts

    Authors: Ben Zhou, Kyle Richardson, Xiaodong Yu, Dan Roth

    Abstract: Explicit decomposition modeling, which involves breaking down complex tasks into more straightforward and often more interpretable sub-tasks, has long been a central theme in developing robust and interpretable NLU systems. However, despite the many datasets and resources built as part of this effort, the majority have small-scale annotations and limited scope, which is insufficient to solve gener… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: Accepted at EMNLP 2022

  12. arXiv:2210.02406  [pdf, other

    cs.CL

    Decomposed Prompting: A Modular Approach for Solving Complex Tasks

    Authors: Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

    Abstract: Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by deco… ▽ More

    Submitted 11 April, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: ICLR'23 Camera Ready

  13. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  14. arXiv:2204.09148  [pdf, other

    cs.CL cs.AI

    What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

    Authors: Matthew Finlayson, Kyle Richardson, Ashish Sabharwal, Peter Clark

    Abstract: The instruction learning paradigm -- where a model learns to perform new tasks from task descriptions alone -- has become popular in general-purpose model research. The capabilities of large transformer models as instruction learners, however, remain poorly understood. We use a controlled synthetic environment to characterize such capabilities. Specifically, we use the task of deciding whether a g… ▽ More

    Submitted 24 May, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: Typos corrected, rewordings

    MSC Class: 68T50 ACM Class: I.2.7

  15. arXiv:2112.09054  [pdf, other

    cs.CL

    Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability

    Authors: Kyle Richardson, Ashish Sabharwal

    Abstract: Investigating the reasoning abilities of transformer models, and discovering new challenging tasks for them, has been a topic of much interest. Recent studies have found these models to be surprisingly strong at performing deductive reasoning over formal logical theories expressed in natural language. A shortcoming of these studies, however, is that they do not take into account that logical theor… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: Accepted to AAAI-2022, AAAI preprint

  16. arXiv:2112.08348  [pdf, other

    cs.CL

    Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts

    Authors: Daniel Khashabi, Shane Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal, Sameer Singh, Yejin Choi

    Abstract: Fine-tuning continuous prompts for target tasks has recently emerged as a compact alternative to full model fine-tuning. Motivated by these promising results, we investigate the feasibility of extracting a discrete (textual) interpretation of continuous prompts that is faithful to the problem they solve. In practice, we observe a "wayward" behavior between the task solved by continuous prompts and… ▽ More

    Submitted 4 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  17. arXiv:2112.00086  [pdf, other

    cs.CL cs.AI

    Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking

    Authors: Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson Liu, Reut Tsarfaty, Dafna Shahaf

    Abstract: While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model behavior. In this work we focus on story understanding, a core competency for NLU systems. However, the main synthetic resource for story unde… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: Code and data will be made available at project page: https://tiny.one/8wjxwd7z

  18. arXiv:2110.08542  [pdf, other

    cs.CL

    Hey AI, Can You Solve Complex Tasks by Talking to Agents?

    Authors: Tushar Khot, Kyle Richardson, Daniel Khashabi, Ashish Sabharwal

    Abstract: Training giant models from scratch for each complex task is resource- and data-inefficient. To help develop models that can leverage existing systems, we propose a new challenge: Learning to solve complex tasks by communicating with existing agents (or models) in natural language. We design a synthetic benchmark, CommaQA, with three complex reasoning tasks (explicit, implicit, numeric) designed to… ▽ More

    Submitted 9 May, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: Accepted to Findings of ACL 2022

  19. arXiv:2110.01509  [pdf, other

    cs.CL cs.AI

    DeepA2: A Modular Framework for Deep Argument Analysis with Pretrained Neural Text2Text Language Models

    Authors: Gregor Betz, Kyle Richardson

    Abstract: In this paper, we present and implement a multi-dimensional, modular framework for performing deep argument analysis (DeepA2) using current pre-trained language models (PTLMs). ArgumentAnalyst -- a T5 model (Raffel et al. 2020) set up and trained within DeepA2 -- reconstructs argumentative texts, which advance an informal argumentation, as valid arguments: It inserts, e.g., missing premises and co… ▽ More

    Submitted 1 July, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: A Demo is available at https://huggingface.co/spaces/debatelab/deepa2-demo , the model can be downloaded from https://huggingface.co/debatelab/argument-analyst , and the datasets can be accessed at https://huggingface.co/datasets/debatelab/aaac

    Journal ref: *SEM 2022

  20. arXiv:2106.03983  [pdf, other

    cs.CL cs.AI

    Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference

    Authors: Hai Hu, He Zhou, Zuoyu Tian, Yiwen Zhang, Yina Ma, Yanting Li, Yixin Nie, Kyle Richardson

    Abstract: Multilingual transformers (XLM, mT5) have been shown to have remarkable transfer skills in zero-shot settings. Most transfer studies, however, rely on automatically translated resources (XNLI, XQuAD), making it hard to discern the particular linguistic knowledge that is being transferred, and the role of expert annotated monolingual datasets when developing task-specific models. We investigate the… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: accepted to ACL Findings 2021

  21. arXiv:2103.13033  [pdf, other

    cs.CL

    Thinking Aloud: Dynamic Context Generation Improves Zero-Shot Reasoning Performance of GPT-2

    Authors: Gregor Betz, Kyle Richardson, Christian Voigt

    Abstract: Thinking aloud is an effective meta-cognitive strategy human reasoners apply to solve difficult problems. We suggest to improve the reasoning ability of pre-trained neural language models in a similar way, namely by expanding a task's context with problem elaborations that are dynamically generated by the language model itself. Our main result is that dynamic problem elaboration significantly impr… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  22. arXiv:2102.03315  [pdf, other

    cs.CL cs.AI

    Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

    Authors: Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark

    Abstract: We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARC-DA dataset addresses these concerns by converting… ▽ More

    Submitted 5 February, 2021; originally announced February 2021.

  23. arXiv:2102.01761  [pdf

    physics.optics cs.AI

    Deep Convolutional Neural Networks to Predict Mutual Coupling Effects in Metasurfaces

    Authors: Sensong An, Bowen Zheng, Mikhail Y. Shalaginov, Hong Tang, Hang Li, Li Zhou, Yunxi Dong, Mohammad Haerinia, Anuradha Murthy Agarwal, Clara Rivero-Baleine, Myungkoo Kang, Kathleen A. Richardson, Tian Gu, Juejun Hu, Clayton Fowler, Hualiang Zhang

    Abstract: Metasurfaces have provided a novel and promising platform for the realization of compact and large-scale optical devices. The conventional metasurface design approach assumes periodic boundary conditions for each element, which is inaccurate in most cases since the near-field coupling effects between elements will change when surrounded by non-identical structures. In this paper, we propose a deep… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: 16 pages, 10 figures

  24. arXiv:2011.08092  [pdf, other

    cs.CL

    A Dataset for Tracking Entities in Open Domain Procedural Text

    Authors: Niket Tandon, Keisuke Sakaguchi, Bhavana Dalvi Mishra, Dheeraj Rajagopal, Peter Clark, Michal Guerquin, Kyle Richardson, Eduard Hovy

    Abstract: We present the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. For example, in a text describing fog removal using potatoes, a car window may transition between being foggy, sticky,opaque, and clear. Previous formulations of this task provide the text and entities involved,and ask how those entities change for just a sm… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: To appear in EMNLP 2020

  25. arXiv:2010.13778  [pdf

    physics.ed-ph cs.ET cs.GL quant-ph

    Achieving a quantum smart workforce

    Authors: Clarice D. Aiello, D. D. Awschalom, Hannes Bernien, Tina Brower-Thomas, Kenneth R. Brown, Todd A. Brun, Justin R. Caram, Eric Chitambar, Rosa Di Felice, Michael F. J. Fox, Stephan Haas, Alexander W. Holleitner, Eric R. Hudson, Jeffrey H. Hunt, Robert Joynt, Scott Koziol, H. J. Lewandowski, Douglas T. McClure, Jens Palsberg, Gina Passante, Kristen L. Pudenz, Christopher J. K. Richardson, Jessica L. Rosenberg, R. S. Ross, Mark Saffman , et al. (7 additional authors not shown)

    Abstract: Interest in building dedicated Quantum Information Science and Engineering (QISE) education programs has greatly expanded in recent years. These programs are inherently convergent, complex, often resource intensive and likely require collaboration with a broad variety of stakeholders. In order to address this combination of challenges, we have captured ideas from many members in the community. Thi… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

    Comments: 18 pages, 2 figures, 1 table

    Journal ref: Quantum Sci. Technol. 6 030501 (2021)

  26. arXiv:2010.12753  [pdf, other

    cs.CL

    Temporal Reasoning on Implicit Events from Distant Supervision

    Authors: Ben Zhou, Kyle Richardson, Qiang Ning, Tushar Khot, Ashish Sabharwal, Dan Roth

    Abstract: We propose TRACIE, a novel temporal reasoning dataset that evaluates the degree to which systems understand implicit events -- events that are not mentioned explicitly in natural language text but can be inferred from it. This introduces a new challenge in temporal reasoning research, where prior work has focused on explicitly mentioned events. Human readers can infer implicit events via commonsen… ▽ More

    Submitted 7 May, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted at NAACL 2021

  27. arXiv:2010.05444  [pdf, other

    cs.CL

    OCNLI: Original Chinese Natural Language Inference

    Authors: Hai Hu, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler, Lawrence S. Moss

    Abstract: Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this paper, we present the first large-scale NLI dataset (consisting of ~56,000 annotated sentence pairs) f… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP 2020

  28. arXiv:2009.07185  [pdf, other

    cs.CL cs.AI

    Critical Thinking for Language Models

    Authors: Gregor Betz, Christian Voigt, Kyle Richardson

    Abstract: This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models. We introduce a synthetic corpus of deductively valid arguments, and generate artificial argumentative texts to train and evaluate GPT-2. Significant transfer learning effects can be observed: Training a model on three simple core schemes allows it to accurately complete conclusions of d… ▽ More

    Submitted 17 December, 2020; v1 submitted 15 September, 2020; originally announced September 2020.

  29. arXiv:2009.00751  [pdf, other

    cs.CL cs.AI

    Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models

    Authors: Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, Ashish Sabharwal

    Abstract: We propose a general framework called Text Modular Networks(TMNs) for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models. To ensure solvability of simpler tasks, TMNs learn the textual input-output behavior (i.e., language) of existing models through their datasets. This differs from prior decomposition-based approache… ▽ More

    Submitted 12 April, 2021; v1 submitted 1 September, 2020; originally announced September 2020.

    Comments: Accepted to NAACL 2021

  30. arXiv:2006.07510  [pdf, ps, other

    cs.CL cs.AI

    Do Dogs have Whiskers? A New Knowledge Base of hasPart Relations

    Authors: Sumithra Bhakthavatsalam, Kyle Richardson, Niket Tandon, Peter Clark

    Abstract: We present a new knowledge-base of hasPart relationships, extracted from a large corpus of generic statements. Complementary to other resources available, it is the first which is all three of: accurate (90% precision), salient (covers relationships a person may mention), and has high coverage of common terms (approximated as within a 10 year old's vocabulary), as well as having several times more… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  31. arXiv:2005.13359  [pdf, other

    cs.CV

    NDD20: A large-scale few-shot dolphin dataset for coarse and fine-grained categorisation

    Authors: Cameron Trotter, Georgia Atkinson, Matt Sharpe, Kirsten Richardson, A. Stephen McGough, Nick Wright, Ben Burville, Per Berggren

    Abstract: We introduce the Northumberland Dolphin Dataset 2020 (NDD20), a challenging image dataset annotated for both coarse and fine-grained instance segmentation and categorisation. This dataset, the first release of the NDD, was created in response to the rapid expansion of computer vision into conservation research and the production of field-deployable systems suited to extreme environmental condition… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 5 pages, 6 figures, download link, submitted to FGVC7 Workshop @ CVPR20

  32. arXiv:2004.14623  [pdf, ps, other

    cs.CL

    Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

    Authors: Atticus Geiger, Kyle Richardson, Christopher Potts

    Abstract: We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions. To facilitate this holistic evaluation, we present… ▽ More

    Submitted 20 November, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: In Proceedings of BlackBoxNLP 2020 at EMNLP 2020

  33. arXiv:2004.05986  [pdf, other

    cs.CL cs.LG

    CLUE: A Chinese Language Understanding Evaluation Benchmark

    Authors: Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu , et al. (7 additional authors not shown)

    Abstract: The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). The problem, however, is that most such benchmarks are limited to English, which has made it difficu… ▽ More

    Submitted 5 November, 2020; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: Accepted by COLING2020; 10 pages, 4 figures

  34. arXiv:2002.05867  [pdf, other

    cs.CL cs.AI

    Transformers as Soft Reasoners over Language

    Authors: Peter Clark, Oyvind Tafjord, Kyle Richardson

    Abstract: Beginning with McCarthy's Advice Taker (1959), AI has pursued the goal of providing a system with explicit, general knowledge and having the system reason over that knowledge. However, expressing the knowledge in a formal (logical or probabilistic) representation has been a major obstacle to this research. This paper investigates a modern approach to this problem where the facts and rules are prov… ▽ More

    Submitted 5 May, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

    Comments: IJCAI 2020

  35. arXiv:2001.00121  [pdf

    physics.optics cs.LG

    A Freeform Dielectric Metasurface Modeling Approach Based on Deep Neural Networks

    Authors: Sensong An, Bowen Zheng, Mikhail Y. Shalaginov, Hong Tang, Hang Li, Li Zhou, Jun Ding, Anuradha Murthy Agarwal, Clara Rivero-Baleine, Myungkoo Kang, Kathleen A. Richardson, Tian Gu, Juejun Hu, Clayton Fowler, Hualiang Zhang

    Abstract: Metasurfaces have shown promising potentials in shaping optical wavefronts while remaining compact compared to bulky geometric optics devices. Design of meta-atoms, the fundamental building blocks of metasurfaces, relies on trial-and-error method to achieve target electromagnetic responses. This process includes the characterization of an enormous amount of different meta-atom designs with differe… ▽ More

    Submitted 31 December, 2019; originally announced January 2020.

  36. arXiv:1912.13337  [pdf, other

    cs.CL

    What Does My QA Model Know? Devising Controlled Probes using Expert Knowledge

    Authors: Kyle Richardson, Ashish Sabharwal

    Abstract: Open-domain question answering (QA) is known to involve several underlying knowledge and reasoning challenges, but are models actually learning such knowledge when trained on benchmark tasks? To investigate this, we introduce several new challenge tasks that probe whether state-of-the-art QA models have general knowledge about word definitions and general taxonomic reasoning, both of which are fun… ▽ More

    Submitted 1 September, 2020; v1 submitted 31 December, 2019; originally announced December 2019.

    Comments: TACL 2020

  37. arXiv:1910.08772  [pdf, ps, other

    cs.CL

    MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity

    Authors: Hai Hu, Qi Chen, Kyle Richardson, Atreyee Mukherjee, Lawrence S. Moss, Sandra Kuebler

    Abstract: We present a new logic-based inference engine for natural language inference (NLI) called MonaLog, which is based on natural logic and the monotonicity calculus. In contrast to existing logic-based approaches, our system is intentionally designed to be as lightweight as possible, and operates using a small set of well-known (surface-level) monotonicity facts about quantifiers, lexical items and to… ▽ More

    Submitted 19 October, 2019; originally announced October 2019.

    Comments: accepted to SCIL 2020

  38. arXiv:1909.07521  [pdf, other

    cs.CL

    Probing Natural Language Inference Models through Semantic Fragments

    Authors: Kyle Richardson, Hai Hu, Lawrence S. Moss, Ashish Sabharwal

    Abstract: Do state-of-the-art models for language understanding already have, or can they easily learn, abilities such as boolean coordination, quantification, conditionals, comparatives, and monotonicity reasoning (i.e., reasoning about word substitutions in sentential contexts)? While such phenomena are involved in natural language inference (NLI) and go beyond basic linguistic understanding, it is unclea… ▽ More

    Submitted 1 December, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

    Comments: AAAI camera-ready version

  39. arXiv:1909.01958  [pdf, other

    cs.CL cs.AI

    From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

    Authors: Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz

    Abstract: AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge. This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more… ▽ More

    Submitted 1 February, 2021; v1 submitted 4 September, 2019; originally announced September 2019.

    Comments: AI Magazine 41 (4) Winter 2020. New analysis sections added

  40. arXiv:1906.03387  [pdf

    physics.optics cs.LG

    A Novel Modeling Approach for All-Dielectric Metasurfaces Using Deep Neural Networks

    Authors: Sensong An, Clayton Fowler, Bowen Zheng, Mikhail Y. Shalaginov, Hong Tang, Hang Li, Li Zhou, Jun Ding, Anuradha Murthy Agarwal, Clara Rivero-Baleine, Kathleen A. Richardson, Tian Gu, Juejun Hu, Hualiang Zhang

    Abstract: Metasurfaces have become a promising means for manipulating optical wavefronts in flat and high-performance optical devices. Conventional metasurface device design relies on trial-and-error methods to obtain target electromagnetic (EM) response, an approach that demands significant efforts to investigate the enormous number of possible meta-atom structures. In this paper, a deep neural network app… ▽ More

    Submitted 8 June, 2019; originally announced June 2019.

    Comments: 18 pages, 8 figures

  41. arXiv:1804.00987  [pdf, ps, other

    cs.CL cs.AI cs.PL

    A Language for Function Signature Representations

    Authors: Kyle Richardson

    Abstract: Recent work by (Richardson and Kuhn, 2017a,b; Richardson et al., 2018) looks at semantic parser induction and question answering in the domain of source code libraries and APIs. In this brief note, we formalize the representations being learned in these studies and introduce a simple domain specific language and a systematic translation from this language to first-order logic. By recasting the tar… ▽ More

    Submitted 18 April, 2018; v1 submitted 31 March, 2018; originally announced April 2018.

    Comments: short note

  42. Polyglot Semantic Parsing in APIs

    Authors: Kyle Richardson, Jonathan Berant, Jonas Kuhn

    Abstract: Traditional approaches to semantic parsing (SP) work by training individual models for each available parallel dataset of text-meaning pairs. In this paper, we explore the idea of polyglot semantic translation, or learning semantic parsing models that are trained on multiple datasets and natural languages. In particular, we focus on translating text to code signature representations using the soft… ▽ More

    Submitted 18 April, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: accepted for NAACL-2018 (camera ready version)

  43. The Code2Text Challenge: Text Generation in Source Code Libraries

    Authors: Kyle Richardson, Sina Zarrieß, Jonas Kuhn

    Abstract: We propose a new shared task for tactical data-to-text generation in the domain of source code libraries. Specifically, we focus on text generation of function descriptions from example software projects. Data is drawn from existing resources used for studying the related problem of semantic parser induction (Richardson and Kuhn, 2017b; Richardson and Kuhn, 2017a), and spans a wide variety of both… ▽ More

    Submitted 31 July, 2017; originally announced August 2017.

    Comments: Proceedings of INLG 2017, shared task track

  44. Function Assistant: A Tool for NL Querying of APIs

    Authors: Kyle Richardson, Jonas Kuhn

    Abstract: In this paper, we describe Function Assistant, a lightweight Python-based toolkit for querying and exploring source code repositories using natural language. The toolkit is designed to help end-users of a target API quickly find information about functions through high-level natural language queries and descriptions. For a given text query and background API, the tool finds candidate functions by… ▽ More

    Submitted 15 September, 2017; v1 submitted 1 June, 2017; originally announced June 2017.

    Comments: in Proceedings of EMNLP-2017 (system demonstrations)

  45. Learning Semantic Correspondences in Technical Documentation

    Authors: Kyle Richardson, Jonas Kuhn

    Abstract: We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation. We focus specifically on the problem of learning translational correspondences between text descriptions and grounded representations in the target documentation, such as formal representation of functions or co… ▽ More

    Submitted 13 May, 2017; originally announced May 2017.

    Comments: accepted to ACL-2017

  46. arXiv:0904.3927  [pdf, ps, other

    cs.CC

    A Critique of "Solving the P/NP Problem Under Intrinsic Uncertainty", arXiv:0811.0463

    Authors: Andrew Keenan Richardson, Cole Arthur Brown

    Abstract: Although whether P equals NP is an important, open problem in computer science, and although Jaeger's 2008 paper, "Solving the P/NP Problem Under Intrinsic Uncertainty" (arXiv:0811.0463) presents an attempt at tackling the problem by discussing the possibility that all computation is uncertain to some degree, there are a number of logical oversights present in that paper which preclude it from s… ▽ More

    Submitted 24 April, 2009; originally announced April 2009.

    Comments: 7 pages