Skip to main content

Showing 1–50 of 109 results for author: Manning, C D

  1. arXiv:2406.12165  [pdf, other

    cs.CL

    Statistical Uncertainty in Word Embeddings: GloVe-V

    Authors: Andrea Vallebueno, Cassandra Handan-Nader, Christopher D. Manning, Daniel E. Ho

    Abstract: Static word embeddings are ubiquitous in computational social science applications and contribute to practical decision-making in a variety of fields including law and healthcare. However, assessing the statistical uncertainty in downstream conclusions drawn from word embedding statistics has remained challenging. When using only point estimates for embeddings, researchers have no streamlined way… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2405.20362  [pdf, other

    cs.CL cs.CY

    Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools

    Authors: Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, Daniel E. Ho

    Abstract: Legal practice has witnessed a sharp rise in products incorporating artificial intelligence (AI). Such tools are designed to assist with a wide range of core legal tasks, from search and summarization of caselaw to document drafting. But the large language models used in these tools are prone to "hallucinate," or make up false information, making their use risky in high-stakes domains. Recently, c… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Our dataset, tool outputs, and labels will be made available upon publication. This version of the manuscript (May 30, 2024) is updated to reflect an evaluation of Westlaw's AI-Assisted Research

  3. arXiv:2405.16039  [pdf, other

    cs.LG cs.AI cs.NE

    MoEUT: Mixture-of-Experts Universal Transformers

    Authors: Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber, Christopher Potts, Christopher D. Manning

    Abstract: Previous work on Universal Transformers (UTs) has demonstrated the importance of parameter sharing across layers. By allowing recurrence in depth, UTs have advantages over standard Transformers in learning compositional generalizations, but layer-sharing comes with a practical limitation of parameter-compute ratio: it drastically reduces the parameter count compared to the non-shared model with th… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  4. arXiv:2404.16250  [pdf, other

    cs.CL

    Semgrex and Ssurgeon, Searching and Manipulating Dependency Graphs

    Authors: John Bauer, Chloe Kiddon, Eric Yeh, Alex Shan, Christopher D. Manning

    Abstract: Searching dependency graphs and manipulating them can be a time consuming and challenging task to get right. We document Semgrex, a system for searching dependency graphs, and introduce Ssurgeon, a system for manipulating the output of Semgrex. The compact language used by these systems allows for easy command line or API processing of dependencies. Additionally, integration with publicly released… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Georgetown University Round Table (GURT) 2023

  5. arXiv:2404.15894  [pdf, other

    cs.CL cs.AI

    Assessing The Potential Of Mid-Sized Language Models For Clinical QA

    Authors: Elliot Bolton, Betty Xiong, Vijaytha Muralidharan, Joel Schamroth, Vivek Muralidharan, Christopher D. Manning, Roxana Daneshjou

    Abstract: Large language models, such as GPT-4 and Med-PaLM, have shown impressive performance on clinical tasks; however, they require access to compute, are closed-source, and cannot be deployed on device. Mid-size models such as BioGPT-large, BioMedLM, LLaMA 2, and Mistral 7B avoid these drawbacks, but their capacity for clinical tasks has been understudied. To help assess their potential for clinical us… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 25 pages, 8 figures

  6. arXiv:2404.03592  [pdf, other

    cs.CL cs.AI cs.LG

    ReFT: Representation Finetuning for Language Models

    Authors: Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

    Abstract: Parameter-efficient finetuning (PEFT) methods seek to adapt large neural models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. We pursue this hypothesis by developing a family of Representation Finetuning (ReFT) methods.… ▽ More

    Submitted 22 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: preprint

  7. arXiv:2404.01268  [pdf, other

    cs.CL cs.AI cs.DL cs.LG cs.SI

    Mapping the Increasing Use of LLMs in Scientific Papers

    Authors: Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, James Y. Zou

    Abstract: Scientific publishing lays the foundation of science by disseminating research findings, fostering collaboration, encouraging reproducibility, and ensuring that scientific knowledge is accessible, verifiable, and built upon over time. Recently, there has been immense speculation about how many people are using large language models (LLMs) like ChatGPT in their academic writing, and to what extent… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  8. arXiv:2403.18421  [pdf, other

    cs.CL cs.AI

    BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text

    Authors: Elliot Bolton, Abhinav Venigalla, Michihiro Yasunaga, David Hall, Betty Xiong, Tony Lee, Roxana Daneshjou, Jonathan Frankle, Percy Liang, Michael Carbin, Christopher D. Manning

    Abstract: Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build an… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 23 pages

  9. arXiv:2403.07809  [pdf, other

    cs.LG cs.CL

    pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

    Authors: Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

    Abstract: Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce $\textbf{pyvene}$, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. $\textbf{pyvene}$ supports complex intervention schemes with an intuiti… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 8 pages, 3 figures

  10. arXiv:2402.06155  [pdf, other

    cs.CL

    Model Editing with Canonical Examples

    Authors: John Hewitt, Sarah Chen, Lanruo Lora Xie, Edward Adams, Percy Liang, Christopher D. Manning

    Abstract: We introduce model editing with canonical examples, a setting in which (1) a single learning example is provided per desired behavior, (2) evaluation is performed exclusively out-of-distribution, and (3) deviation from an initial model is strictly limited. A canonical example is a simple instance of good behavior, e.g., The capital of Mauritius is Port Louis) or bad behavior, e.g., An aspect of re… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  11. arXiv:2401.18059  [pdf, other

    cs.CL cs.LG

    RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

    Authors: Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, Christopher D. Manning

    Abstract: Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree wit… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  12. arXiv:2311.15077  [pdf, other

    cs.CL

    Multilingual self-supervised speech representations improve the speech recognition of low-resource African languages with codeswitching

    Authors: Tolúlopé Ògúnrèmí, Christopher D. Manning, Dan Jurafsky

    Abstract: While many speakers of low-resource languages regularly code-switch between their languages and other regional languages or English, datasets of codeswitched speech are too small to train bespoke acoustic models from scratch or do language model rescoring. Here we propose finetuning self-supervised speech representations such as wav2vec 2.0 XLSR to recognize code-switched data. We find that finetu… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: 5 pages, 1 figure. Computational Approaches to Linguistic Code-Switching, CALCS 2023 (co-located with EMNLP 2023)

  13. arXiv:2311.08401  [pdf, other

    cs.CL cs.AI cs.LG

    Fine-tuning Language Models for Factuality

    Authors: Katherine Tian, Eric Mitchell, Huaxiu Yao, Christopher D. Manning, Chelsea Finn

    Abstract: The fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations.' These errors can inadvertently spread misinformation or harmfully perpetuate misconceptions. Further, manual… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  14. arXiv:2310.19089  [pdf, other

    cs.CL

    Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

    Authors: Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning

    Abstract: Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism. Consequently, Transformer language models poorly capture long-tail recursive structure and exhibit sample-inefficient syntactic generalization. This work introduces Pushdown Layers, a new self-attention layer that models recursive s… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: Accepted at EMNLP 2023 (Long Papers)

  15. arXiv:2310.12962  [pdf, other

    cs.CL cs.AI cs.LG

    An Emulator for Fine-Tuning Large Language Models using Small Language Models

    Authors: Eric Mitchell, Rafael Rafailov, Archit Sharma, Chelsea Finn, Christopher D. Manning

    Abstract: Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or other specifications of desired behaviors. While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filte… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

  16. arXiv:2305.18741  [pdf, other

    cs.CL

    Grokking of Hierarchical Structure in Vanilla Transformers

    Authors: Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning

    Abstract: For humans, language production and comprehension is sensitive to the hierarchical structure of sentences. In natural language processing, past work has questioned how effectively neural sequence models like transformers capture this hierarchical structure when generalizing to structurally novel inputs. We show that transformer language models can learn to generalize hierarchically after training… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  17. arXiv:2305.18290  [pdf, other

    cs.LG cs.AI cs.CL

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    Authors: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn

    Abstract: While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these prefere… ▽ More

    Submitted 13 December, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  18. arXiv:2305.16765  [pdf, other

    cs.CL

    Backpack Language Models

    Authors: John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang

    Abstract: We present Backpacks: a new neural architecture that marries strong modeling performance with an interface for interpretability and control. Backpacks learn multiple non-contextual sense vectors for each word in a vocabulary, and represent a word in a sequence as a context-dependent, non-negative linear combination of sense vectors in this sequence. We find that, after training, sense vectors spec… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Camera-Ready

  19. arXiv:2305.15076  [pdf, other

    cs.CL

    Meta-Learning Online Adaptation of Language Models

    Authors: Nathan Hu, Eric Mitchell, Christopher D. Manning, Chelsea Finn

    Abstract: Large language models encode impressively broad world knowledge in their parameters. However, the knowledge in static language models falls out of date, limiting the model's effective "shelf life." While online fine-tuning can reduce this degradation, we find that naively fine-tuning on a stream of documents leads to a low level of information uptake. We hypothesize that online fine-tuning does no… ▽ More

    Submitted 20 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Camera Ready

  20. arXiv:2305.14975  [pdf, other

    cs.CL

    Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

    Authors: Katherine Tian, Eric Mitchell, Allan Zhou, Archit Sharma, Rafael Rafailov, Huaxiu Yao, Chelsea Finn, Christopher D. Manning

    Abstract: A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of low-confidence predictions. Recent studies have shown that unsupervised pre-training produces large language models (LMs) whose conditional probabilities are re… ▽ More

    Submitted 24 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 Camera Ready

  21. arXiv:2305.14795  [pdf, other

    cs.CL

    MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

    Authors: Zexuan Zhong, Zhengxuan Wu, Christopher D. Manning, Christopher Potts, Danqi Chen

    Abstract: The information stored in large language models (LLMs) falls out of date quickly, and retraining from scratch is often not an option. This has recently given rise to a range of techniques for injecting new facts through updating model weights. Current evaluation paradigms are extremely limited, mainly validating the recall of edited facts, but changing one fact should cause rippling changes to the… ▽ More

    Submitted 29 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023. Our code and datasets are available at https://github.com/princeton-nlp/MQuAKE

  22. arXiv:2303.13716  [pdf, other

    cs.CL

    ReCOGS: How Incidental Details of a Logical Form Overshadow an Evaluation of Semantic Interpretation

    Authors: Zhengxuan Wu, Christopher D. Manning, Christopher Potts

    Abstract: Compositional generalization benchmarks for semantic parsing seek to assess whether models can accurately compute meanings for novel sentences, but operationalize this in terms of logical form (LF) prediction. This raises the concern that semantically irrelevant details of the chosen LFs could shape model performance. We argue that this concern is realized for the COGS benchmark. COGS poses genera… ▽ More

    Submitted 23 January, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: TACL 2023

  23. arXiv:2301.11305  [pdf, other

    cs.CL cs.AI

    DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

    Authors: Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn

    Abstract: The increasing fluency and widespread usage of large language models (LLMs) highlight the desirability of corresponding tools aiding detection of LLM-generated text. In this paper, we identify a property of the structure of an LLM's probability function that is useful for such detection. Specifically, we demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the mo… ▽ More

    Submitted 23 July, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: ICML 2023

  24. arXiv:2211.14946  [pdf, other

    cs.LG

    Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of Foundation Models

    Authors: Peter Henderson, Eric Mitchell, Christopher D. Manning, Dan Jurafsky, Chelsea Finn

    Abstract: A growing ecosystem of large, open-source foundation models has reduced the labeled data and technical expertise necessary to apply machine learning to many new problems. Yet foundation models pose a clear dual-use risk, indiscriminately reducing the costs of building both harmful and beneficial machine learning systems. Policy tools such as restricted model access and export controls are the prim… ▽ More

    Submitted 8 August, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

    Comments: v1 Presented at the First Workshop of Pre-training: Perspectives, Pitfalls, and Paths Forward (ICML, 2022) and New Frontiers in Adversarial Machine Learning Workshop (ICML, 2022); v2 Presented at the Sixth AAAI/ACM Conference on AI, Ethics, and Society (AIES, 2023)

  25. arXiv:2211.11875  [pdf, other

    cs.CL cs.AI

    Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference

    Authors: Eric Mitchell, Joseph J. Noh, Siyan Li, William S. Armstrong, Ananth Agarwal, Patrick Liu, Chelsea Finn, Christopher D. Manning

    Abstract: While large pre-trained language models are powerful, their predictions often lack logical consistency across test inputs. For example, a state-of-the-art Macaw question-answering (QA) model answers 'Yes' to 'Is a sparrow a bird?' and 'Does a bird have feet?' but answers 'No' to 'Does a sparrow have feet?'. To address this failure mode, we propose a framework, Consistency Correction through Relati… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: 16 pages. EMNLP 2022 Camera Ready. See https://ericmitchell.ai/emnlp-2022-concord/ for code and data

  26. arXiv:2211.09113  [pdf, other

    cs.CL

    On Measuring the Intrinsic Few-Shot Hardness of Datasets

    Authors: Xinran Zhao, Shikhar Murty, Christopher D. Manning

    Abstract: While advances in pre-training have led to dramatic improvements in few-shot learning of NLP tasks, there is limited understanding of what drives successful few-shot adaptation in datasets. In particular, given a new dataset and a pre-trained model, what properties of the dataset make it \emph{few-shot learnable} and are these properties independent of the specific adaptation techniques used? We c… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: EMNLP 2022 camera ready version

  27. arXiv:2211.09110  [pdf, other

    cs.CL cs.AI cs.LG

    Holistic Evaluation of Language Models

    Authors: Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao , et al. (25 additional authors not shown)

    Abstract: Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest fo… ▽ More

    Submitted 1 October, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Project page: https://crfm.stanford.edu/helm/v1.0

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2023

  28. arXiv:2211.03318  [pdf, other

    cs.CL

    Fixing Model Bugs with Natural Language Patches

    Authors: Shikhar Murty, Christopher D. Manning, Scott Lundberg, Marco Tulio Ribeiro

    Abstract: Current approaches for fixing systematic problems in NLP models (e.g. regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts. In contrast, humans often provide corrections to each other through natural language. Taking inspiration from this, we explore natural language patches -- declarative statements that allow developers to provide corrective feed… ▽ More

    Submitted 20 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: Accepted at EMNLP 2022 [Fixed fig-1]

  29. arXiv:2211.01288  [pdf, other

    cs.CL

    Characterizing Intrinsic Compositionality in Transformers with Tree Projections

    Authors: Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning

    Abstract: When trained on language data, do transformers learn some arbitrary computation that utilizes the full capacity of the architecture or do they learn a simpler, tree-like computation, hypothesized to underlie compositional meaning systems like human languages? There is an apparent tension between compositional accounts of human language understanding, which are based on a restricted bottom-up compu… ▽ More

    Submitted 3 November, 2022; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: Fixed title and metadata

  30. arXiv:2210.15191  [pdf, other

    cs.CL

    Truncation Sampling as Language Model Desmoothing

    Authors: John Hewitt, Christopher D. Manning, Percy Liang

    Abstract: Long samples of text from neural language models can be of poor quality. Truncation sampling algorithms--like top-$p$ or top-$k$ -- address this by setting some words' probabilities to zero at each step. This work provides framing for the aim of truncation, and an improved algorithm for that aim. We propose thinking of a neural language model as a mixture of a true distribution and a smoothing dis… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP, + small fixes

  31. arXiv:2210.09338  [pdf, other

    cs.CL cs.AI cs.LG

    Deep Bidirectional Language-Knowledge Graph Pretraining

    Authors: Michihiro Yasunaga, Antoine Bosselut, Hongyu Ren, Xikun Zhang, Christopher D Manning, Percy Liang, Jure Leskovec

    Abstract: Pretraining a language model (LM) on text has been shown to help various downstream NLP tasks. Recent works show that a knowledge graph (KG) can complement text data, offering structured background knowledge that provides a useful scaffold for reasoning. However, these works are not pretrained to learn a deep fusion of the two modalities at scale, limiting the potential to acquire fully joint repr… ▽ More

    Submitted 18 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Published at NeurIPS 2022. Code, data, and trained models are available at https://github.com/michiyasunaga/dragon

  32. arXiv:2208.03812  [pdf, other

    cs.CL cs.SD eess.AS

    When can I Speak? Predicting initiation points for spoken dialogue agents

    Authors: Siyan Li, Ashwin Paranjape, Christopher D. Manning

    Abstract: Current spoken dialogue systems initiate their turns after a long period of silence (700-1000ms), which leads to little real-time feedback, sluggish responses, and an overall stilted conversational flow. Humans typically respond within 200ms and successfully predicting initiation points in advance would allow spoken dialogue agents to do the same. In this work, we predict the lead-time to initiati… ▽ More

    Submitted 7 August, 2022; originally announced August 2022.

    Comments: SIGDIAL 2022

  33. arXiv:2207.12021  [pdf, other

    cs.CL

    Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent

    Authors: Ethan A. Chi, Ashwin Paranjape, Abigail See, Caleb Chiam, Trenton Chang, Kathleen Kenealy, Swee Kiat Lim, Amelia Hardy, Chetanya Rastogi, Haojun Li, Alexander Iyabor, Yutong He, Hari Sowrirajan, Peng Qi, Kaushik Ram Sadagopan, Nguyet Minh Phu, Dilara Soylu, Jillian Tang, Avanika Narayan, Giovanni Campagna, Christopher D. Manning

    Abstract: We present Chirpy Cardinal, an open-domain social chatbot. Aiming to be both informative and conversational, our bot chats with users in an authentic, emotionally intelligent way. By integrating controlled neural generation with scaffolded, hand-written dialogue, we let both the user and bot take turns driving the conversation, producing an engaging and socially fluent experience. Deployed in the… ▽ More

    Submitted 16 January, 2023; v1 submitted 25 July, 2022; originally announced July 2022.

    Comments: SIGDIAL '22

  34. arXiv:2207.00220  [pdf, other

    cs.CL cs.CY

    Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

    Authors: Peter Henderson, Mark S. Krass, Lucia Zheng, Neel Guha, Christopher D. Manning, Dan Jurafsky, Daniel E. Ho

    Abstract: One concern with the rise of large language models lies with their potential for significant harm, particularly from pretraining on biased, obscene, copyrighted, and private information. Emerging ethical approaches have attempted to filter pretraining material, but such approaches have been ad hoc and failed to take context into account. We offer an approach to filtering grounded in law, which has… ▽ More

    Submitted 29 November, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: Presented at NeurIPS Datasets & Benchmarks (2022)

  35. arXiv:2206.06520  [pdf, other

    cs.AI cs.CL

    Memory-Based Model Editing at Scale

    Authors: Eric Mitchell, Charles Lin, Antoine Bosselut, Christopher D. Manning, Chelsea Finn

    Abstract: Even the largest neural networks make errors, and once-correct predictions can become invalid as the world changes. Model editors make local updates to the behavior of base (pre-trained) models to inject updated knowledge or correct undesirable behaviors. Existing model editors have shown promise, but also suffer from insufficient expressiveness: they struggle to accurately model an edit's intende… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: ICML 2022. Project site at https://sites.google.com/view/serac-editing

  36. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  37. arXiv:2205.12702  [pdf, other

    cs.CL

    Detecting Label Errors by using Pre-Trained Language Models

    Authors: Derek Chong, Jenny Hong, Christopher D. Manning

    Abstract: We show that large pre-trained language models are inherently highly capable of identifying label errors in natural language datasets: simply examining out-of-sample data points in descending order of fine-tuned task loss significantly outperforms more complex error-detection mechanisms proposed in previous work. To this end, we contribute a novel method for introducing realistic, human-originat… ▽ More

    Submitted 15 December, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: 18 pages, 10 figures. Accepted to EMNLP 2022; typesetting of this version slightly differs from conference version

  38. arXiv:2202.09381  [pdf, other

    cs.CL

    Synthetic Disinformation Attacks on Automated Fact Verification Systems

    Authors: Yibing Du, Antoine Bosselut, Christopher D. Manning

    Abstract: Automated fact-checking is a needed technology to curtail the spread of online misinformation. One current framework for such solutions proposes to verify claims by retrieving supporting or refuting evidence from related textual sources. However, the realistic use cases for fact-checkers will require verifying claims against evidence sources that could be affected by the same misinformation. Furth… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

    Comments: AAAI 2022

  39. arXiv:2201.08860  [pdf, other

    cs.CL cs.LG

    GreaseLM: Graph REASoning Enhanced Language Models for Question Answering

    Authors: Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, Jure Leskovec

    Abstract: Answering complex questions about textual narratives requires reasoning over both stated context and the world knowledge that underlies it. However, pretrained language models (LM), the foundation of most modern QA systems, do not robustly represent latent relationships between concepts, which is necessary for reasoning. While knowledge graphs (KG) are often used to augment LMs with structured rep… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

    Comments: Published at ICLR 2022. All code, data, and pretrained models are available at https://github.com/snap-stanford/GreaseLM

  40. arXiv:2112.07381  [pdf, other

    cs.CL cs.AI

    You Only Need One Model for Open-domain Question Answering

    Authors: Haejun Lee, Akhil Kedia, Jongwon Lee, Ashwin Paranjape, Christopher D. Manning, Kyoung-Gu Woo

    Abstract: Recent approaches to Open-domain Question Answering refer to an external knowledge base using a retriever model, optionally rerank passages with a separate reranker model and generate an answer using another reader model. Despite performing related tasks, the models have separate parameters and are weakly-coupled during training. We propose casting the retriever and the reranker as internal passag… ▽ More

    Submitted 28 October, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: EMNLP 2022 (main)

  41. arXiv:2110.11309  [pdf, other

    cs.LG cs.AI cs.CL

    Fast Model Editing at Scale

    Authors: Eric Mitchell, Charles Lin, Antoine Bosselut, Chelsea Finn, Christopher D. Manning

    Abstract: While large pre-trained models have enabled impressive results on a variety of downstream tasks, the largest existing models still make errors, and even accurate predictions may become outdated over time. Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise intact is… ▽ More

    Submitted 13 June, 2022; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: ICLR 2022. View implementation and additional project info at https://sites.google.com/view/mend-editing

  42. arXiv:2110.07752  [pdf, other

    cs.CL cs.IR

    Hindsight: Posterior-guided training of retrievers for improved open-ended generation

    Authors: Ashwin Paranjape, Omar Khattab, Christopher Potts, Matei Zaharia, Christopher D. Manning

    Abstract: Many text generation systems benefit from using a retriever to retrieve passages from a textual knowledge corpus (e.g., Wikipedia) which are then provided as additional context to the generator. For open-ended generation tasks (like generating informative utterances in conversations) many varied passages may be equally relevant and we find that existing methods that jointly train the retriever and… ▽ More

    Submitted 20 October, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

  43. arXiv:2110.01799  [pdf, other

    cs.CL cs.AI cs.LG

    ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts

    Authors: Yuta Koreeda, Christopher D. Manning

    Abstract: Reviewing contracts is a time-consuming procedure that incurs large expenses to companies and social inequality to those who cannot afford it. In this work, we propose "document-level natural language inference (NLI) for contracts", a novel, real-world application of NLI that addresses such problems. In this task, a system is given a set of hypotheses (such as "Some obligations of Agreement may su… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: Accepted at the Findings of the Association for Computational Linguistics: EMNLP 2021

  44. arXiv:2109.09234  [pdf, other

    cs.CL

    Conditional probing: measuring usable information beyond a baseline

    Authors: John Hewitt, Kawin Ethayarajh, Percy Liang, Christopher D. Manning

    Abstract: Probing experiments investigate the extent to which neural representations make properties -- like part-of-speech -- predictable. One suggests that a representation encodes a property if probing that representation produces higher accuracy than probing a baseline representation like non-contextual word embeddings. Instead of using baselines as a point of comparison, we're interested in measuring i… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021 + typo fixes

  45. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  46. arXiv:2107.09285  [pdf, other

    cs.CL cs.AI

    Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning

    Authors: Kaylee Burns, Christopher D. Manning, Li Fei-Fei

    Abstract: Although virtual agents are increasingly situated in environments where natural language is the most effective mode of interaction with humans, these exchanges are rarely used as an opportunity for learning. Leveraging language interactions effectively requires addressing limitations in the two most common approaches to language grounding: semantic parsers built on top of fixed object categories a… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

    Comments: 17 pages, 10 figures

    ACM Class: I.2.7

  47. arXiv:2107.02331  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering

    Authors: Siddharth Karamcheti, Ranjay Krishna, Li Fei-Fei, Christopher D. Manning

    Abstract: Active learning promises to alleviate the massive data needs of supervised machine learning: it has successfully improved sample efficiency by an order of magnitude on traditional tasks like topic classification and object recognition. However, we uncover a striking contrast to this promise: across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: Accepted at ACL-IJCNLP 2021. 17 pages, 16 Figures

  48. arXiv:2105.00150  [pdf, other

    cs.CL cs.CV cs.IR

    Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

    Authors: Yuta Koreeda, Christopher D. Manning

    Abstract: While many NLP pipelines assume raw, clean texts, many texts we encounter in the wild, including a vast majority of legal documents, are not so clean, with many of them being visually structured documents (VSDs) such as PDFs. Conventional preprocessing tools for VSDs mainly focused on word segmentation and coarse layout analysis, whereas fine-grained logical structure analysis (such as identifying… ▽ More

    Submitted 7 November, 2021; v1 submitted 30 April, 2021; originally announced May 2021.

    Comments: 11 pages, 5 figure

  49. arXiv:2104.07831  [pdf, other

    cs.CL

    Human-like informative conversations: Better acknowledgements using conditional mutual information

    Authors: Ashwin Paranjape, Christopher D. Manning

    Abstract: This work aims to build a dialogue agent that can weave new factual content into conversations as naturally as humans. We draw insights from linguistic principles of conversational analysis and annotate human-human conversations from the Switchboard Dialog Act Corpus to examine humans strategies for acknowledgement, transition, detail selection and presentation. When current chatbots (explicitly p… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  50. arXiv:2012.08561  [pdf, other

    cs.CL

    Pre-Training Transformers as Energy-Based Cloze Models

    Authors: Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning

    Abstract: We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: EMNLP 2020