Skip to main content

Showing 1–50 of 58 results for author: Pavlick, E

  1. arXiv:2406.15955  [pdf, other

    cs.CV cs.AI

    Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects

    Authors: Michael A. Lepori, Alexa R. Tartaglini, Wai Keen Vong, Thomas Serre, Brenden M. Lake, Ellie Pavlick

    Abstract: Though vision transformers (ViTs) have achieved state-of-the-art performance in a variety of settings, they exhibit surprising failures when performing tasks involving visual relations. This begs the question: how do ViTs attempt to perform tasks that require computing visual relations between objects? Prior efforts to interpret ViTs tend to focus on characterizing relevant low-level visual featur… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2406.13803  [pdf, other

    cs.CL

    Semantic Structure-Mapping in LLM and Human Analogical Reasoning

    Authors: Sam Musker, Alex Duchnowski, Raphaël Millière, Ellie Pavlick

    Abstract: Analogical reasoning is considered core to human learning and cognition. Recent studies have compared the analogical reasoning abilities of human subjects and Large Language Models (LLMs) on abstract symbol manipulation tasks, such as letter string analogies. However, these studies largely neglect analogical reasoning over semantically meaningful symbols, such as natural language words. This abili… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.09519  [pdf, other

    cs.CL cs.AI

    Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

    Authors: Jack Merullo, Carsten Eickhoff, Ellie Pavlick

    Abstract: Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model. By analyzing particular mechanism LMs use to accomplish this, we find that it is also used to recall items from a list, and show that this mechanism can explain an otherwise arbitrary-seeming sensitivity… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2406.00053  [pdf, other

    cs.CL cs.LG

    Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

    Authors: Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

    Abstract: Language models have the ability to perform in-context learning (ICL), allowing them to flexibly adapt their behavior based on context. This contrasts with in-weights learning, where information is statically encoded in model parameters from iterated observations of the data. Despite this apparent ability to learn in-context, language models are known to struggle when faced with unseen or rarely s… ▽ More

    Submitted 1 July, 2024; v1 submitted 28 May, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures

  5. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  6. arXiv:2404.12444  [pdf, other

    cs.CL cs.AI

    mOthello: When Do Cross-Lingual Representation Alignment and Cross-Lingual Transfer Emerge in Multilingual Models?

    Authors: Tianze Hua, Tian Yun, Ellie Pavlick

    Abstract: Many pretrained multilingual models exhibit cross-lingual transfer ability, which is often attributed to a learned language-neutral representation during pretraining. However, it remains unclear what factors contribute to the learning of a language-neutral representation, and whether the learned language-neutral representation suffices to facilitate cross-lingual transfer. We propose a synthetic t… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted at Findings of NAACL 2024. Project Webpage: https://multilingual-othello.github.io/

  7. arXiv:2403.05576  [pdf

    cs.HC cs.AI

    Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction

    Authors: Senjuti Dutta, Sherol Chen, Sunny Mak, Amnah Ahmad, Katherine Collins, Alena Butryna, Deepak Ramachandran, Krishnamurthy Dvijotham, Ellie Pavlick, Ravi Rajakumar

    Abstract: Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use… ▽ More

    Submitted 26 February, 2024; originally announced March 2024.

  8. arXiv:2403.05534  [pdf, other

    cs.CL

    Bayesian Preference Elicitation with Language Models

    Authors: Kunal Handa, Yarin Gal, Ellie Pavlick, Noah Goodman, Jacob Andreas, Alex Tamkin, Belinda Z. Li

    Abstract: Aligning AI systems to users' interests requires understanding and incorporating humans' complex values and preferences. Recently, language models (LMs) have been used to gather information about the preferences of human users. This preference data can be used to fine-tune or guide other LMs and/or AI systems. However, LMs have been shown to struggle with crucial aspects of preference learning: qu… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  9. arXiv:2402.08674  [pdf, other

    cs.NE cs.LG q-bio.NC

    Human Curriculum Effects Emerge with In-Context Learning in Neural Networks

    Authors: Jacob Russin, Ellie Pavlick, Michael J. Frank

    Abstract: Human learning is sensitive to rule-like structure and the curriculum of examples used for training. In tasks governed by succinct rules, learning is more robust when related examples are blocked across trials, but in the absence of such rules, interleaving is more effective. To date, no neural model has simultaneously captured these seemingly contradictory effects. Here we show that this same tra… ▽ More

    Submitted 12 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: 7 pages, 4 figures, accepted as a talk + full paper at CogSci 2024

  10. arXiv:2402.08211  [pdf, other

    cs.AI

    Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks

    Authors: Aaron Traylor, Jack Merullo, Michael J. Frank, Ellie Pavlick

    Abstract: Models based on the Transformer neural network architecture have seen success on a wide variety of tasks that appear to require complex "cognitive branching" -- or the ability to maintain pursuit of one goal while accomplishing others. In cognitive neuroscience, success on such tasks is thought to rely on sophisticated frontostriatal mechanisms for selective \textit{gating}, which enable role-addr… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 figures

    ACM Class: I.2.6

  11. arXiv:2311.06411  [pdf, other

    cs.CV cs.CL

    Analyzing Modular Approaches for Visual Question Decomposition

    Authors: Apoorv Khandelwal, Ellie Pavlick, Chen Sun

    Abstract: Modular neural networks without additional training have recently been shown to surpass end-to-end neural networks on challenging vision-language tasks. The latest such methods simultaneously introduce LLM-based code generation to build programs and a number of skill-specific, task-oriented modules to execute them. In this paper, we focus on ViperGPT and ask where its additional performance comes… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: Published at EMNLP 2023 (Main Conference). Source code: https://github.com/brown-palm/visual-question-decomposition

  12. arXiv:2311.04354  [pdf, other

    cs.CL

    Uncovering Intermediate Variables in Transformers using Circuit Probing

    Authors: Michael A. Lepori, Thomas Serre, Ellie Pavlick

    Abstract: Neural network models have achieved high performance on a wide variety of complex tasks, but the algorithms that they implement are notoriously difficult to interpret. In order to understand these algorithms, it is often necessary to hypothesize intermediate variables involved in the network's computation. For example, does a language model depend on particular syntactic properties when generating… ▽ More

    Submitted 17 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

  13. arXiv:2311.02171  [pdf, other

    cs.LG cs.AI

    Emergence of Abstract State Representations in Embodied Sequence Modeling

    Authors: Tian Yun, Zilai Zeng, Kunal Handa, Ashish V. Thapliyal, Bo Pang, Ellie Pavlick, Chen Sun

    Abstract: Decision making via sequence modeling aims to mimic the success of language models, where actions taken by an embodied agent are modeled as tokens to predict. Despite their promising performance, it remains unclear if embodied sequence modeling leads to the emergence of internal representations that represent the environmental state information. A model that lacks abstract state representations wo… ▽ More

    Submitted 7 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023). Project webpage: https://abstract-state-seqmodel.github.io/

  14. arXiv:2310.15910  [pdf, other

    cs.CL cs.AI

    Characterizing Mechanisms for Factual Recall in Language Models

    Authors: Qinan Yu, Jack Merullo, Ellie Pavlick

    Abstract: Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in s… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  15. arXiv:2310.10899  [pdf, other

    cs.LG cs.AI

    Instilling Inductive Biases with Subnetworks

    Authors: Enyan Zhang, Michael A. Lepori, Ellie Pavlick

    Abstract: Despite the recent success of artificial neural networks on a variety of tasks, we have little knowledge or control over the exact solutions these models implement. Instilling inductive biases -- preferences for some solutions over others -- into these models is one promising path toward understanding and controlling their behavior. Much work has been done to study the inherent inductive biases of… ▽ More

    Submitted 31 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  16. arXiv:2310.09612  [pdf, other

    cs.CV cs.AI

    Deep Neural Networks Can Learn Generalizable Same-Different Visual Relations

    Authors: Alexa R. Tartaglini, Sheridan Feucht, Michael A. Lepori, Wai Keen Vong, Charles Lovering, Brenden M. Lake, Ellie Pavlick

    Abstract: Although deep neural networks can achieve human-level performance on many object recognition benchmarks, prior work suggests that these same models fail to learn simple abstract relations, such as determining whether two objects are the same or different. Much of this prior work focuses on training convolutional neural networks to classify images of two same or two different abstract shapes, testi… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  17. arXiv:2310.08744  [pdf, other

    cs.CL cs.LG

    Circuit Component Reuse Across Tasks in Transformer Language Models

    Authors: Jack Merullo, Carsten Eickhoff, Ellie Pavlick

    Abstract: Recent work in mechanistic interpretability has shown that behaviors in language models can be successfully reverse-engineered through circuit analysis. A common criticism, however, is that each circuit is task-specific, and thus such analysis cannot contribute to understanding the models at a higher level. In this work, we present evidence that insights (both low-level findings about specific hea… ▽ More

    Submitted 6 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024

  18. arXiv:2309.00244  [pdf, other

    cs.LG cs.CL

    NeuroSurgeon: A Toolkit for Subnetwork Analysis

    Authors: Michael A. Lepori, Ellie Pavlick, Thomas Serre

    Abstract: Despite recent advances in the field of explainability, much remains unknown about the algorithms that neural networks learn to represent. Recent work has attempted to understand trained models by decomposing them into functional circuits (Csordás et al., 2020; Lepori et al., 2023). To advance this research, we developed NeuroSurgeon, a python library that can be used to discover and manipulate su… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  19. arXiv:2306.01755  [pdf, other

    cs.CV cs.AI cs.CL

    Training Priors Predict Text-To-Image Model Performance

    Authors: Charles Lovering, Ellie Pavlick

    Abstract: Text-to-image models can often generate some relations, i.e., "astronaut riding horse", but fail to generate other relations composed of the same basic parts, i.e., "horse riding astronaut". These failures are often taken as evidence that models rely on training priors rather than constructing novel images compositionally. This paper tests this intuition on the stablediffusion 2.1 text-to-image mo… ▽ More

    Submitted 24 October, 2023; v1 submitted 23 May, 2023; originally announced June 2023.

  20. arXiv:2305.16130  [pdf, other

    cs.CL cs.LG

    Language Models Implement Simple Word2Vec-style Vector Arithmetic

    Authors: Jack Merullo, Carsten Eickhoff, Ellie Pavlick

    Abstract: A primary criticism towards language models (LMs) is their inscrutability. This paper presents evidence that, despite their size and complexity, LMs sometimes exploit a simple vector arithmetic style mechanism to solve some relational tasks using regularities encoded in the hidden space of the model (e.g., Poland:Warsaw::China:Beijing). We investigate a range of language model sizes (from 124M par… ▽ More

    Submitted 3 April, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: NAACL

  21. arXiv:2305.14630  [pdf, other

    cs.CL

    Testing Causal Models of Word Meaning in GPT-3 and -4

    Authors: Sam Musker, Ellie Pavlick

    Abstract: Large Language Models (LLMs) have driven extraordinary improvements in NLP. However, it is unclear how such models represent lexical concepts-i.e., the meanings of the words they use. This paper evaluates the lexical representations of GPT-3 and GPT-4 through the lens of HIPE theory, a theory of concept representations which focuses on representations of words describing artifacts (such as "mop",… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Unabridged version. Code available at https://github.com/smusker/Causal_Models_Of_Word_Meaning

    ACM Class: I.2.7

  22. arXiv:2303.12737  [pdf, other

    cs.CV cs.AI cs.CL

    Comparing Trajectory and Vision Modalities for Verb Representation

    Authors: Dylan Ebert, Chen Sun, Ellie Pavlick

    Abstract: Three-dimensional trajectories, or the 3D position and rotation of objects over time, have been shown to encode key aspects of verb semantics (e.g., the meanings of roll vs. slide). However, most multimodal models in NLP use 2D images as representations of the world. Given the importance of 3D space in formal models of verb semantics, we expect that these 2D images would result in impoverished rep… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: 4 pages, 1 figure

    MSC Class: 68T50

  23. arXiv:2303.08114  [pdf, other

    cs.LG cs.CL

    Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs

    Authors: Kelvin Guu, Albert Webson, Ellie Pavlick, Lucas Dixon, Ian Tenney, Tolga Bolukbasi

    Abstract: Training data attribution (TDA) methods offer to trace a model's prediction on any given example back to specific influential training examples. Existing approaches do so by assigning a scalar influence score to each training example, under a simplifying assumption that influence is additive. But in reality, we observe that training examples interact in highly non-additive ways due to factors such… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  24. arXiv:2301.10884  [pdf, other

    cs.CL cs.AI

    Break It Down: Evidence for Structural Compositionality in Neural Networks

    Authors: Michael A. Lepori, Thomas Serre, Ellie Pavlick

    Abstract: Though modern neural networks have achieved impressive performance in both vision and language tasks, we know little about the functions that they implement. One possibility is that neural networks implicitly break down complex tasks into subroutines, implement modular solutions to these subroutines, and compose them into an overall solution to a task - a property we term structural compositionali… ▽ More

    Submitted 6 November, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  25. arXiv:2301.07085  [pdf, other

    cs.CL cs.AI

    Are Language Models Worse than Humans at Following Prompts? It's Complicated

    Authors: Albert Webson, Alyssa Marie Loo, Qinan Yu, Ellie Pavlick

    Abstract: Prompts have been the center of progress in advancing language models' zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts. Such results may be interpreted as evidence that model behavior is not "human like". In this study, we challenge a central assumption in such work: that humans would… ▽ More

    Submitted 11 November, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

    Comments: EMNLP 2023

  26. arXiv:2212.10537  [pdf, other

    cs.CV cs.AI cs.CL

    Does CLIP Bind Concepts? Probing Compositionality in Large Image Models

    Authors: Martha Lewis, Nihal V. Nayak, Peilin Yu, Qinan Yu, Jack Merullo, Stephen H. Bach, Ellie Pavlick

    Abstract: Large-scale neural network models combining text and images have made incredible progress in recent years. However, it remains an open question to what extent such models encode compositional representations of the concepts over which they operate, such as correctly identifying ''red cube'' by reasoning over the constituents ''red'' and ''cube''. In this work, we focus on the ability of a large pr… ▽ More

    Submitted 29 March, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  27. arXiv:2211.14673  [pdf, other

    cs.AI

    Evaluation Beyond Task Performance: Analyzing Concepts in AlphaZero in Hex

    Authors: Charles Lovering, Jessica Zosa Forde, George Konidaris, Ellie Pavlick, Michael L. Littman

    Abstract: AlphaZero, an approach to reinforcement learning that couples neural networks and Monte Carlo tree search (MCTS), has produced state-of-the-art strategies for traditional board games like chess, Go, shogi, and Hex. While researchers and game commentators have suggested that AlphaZero uses concepts that humans consider important, it is unclear how these concepts are captured in the network. We inve… ▽ More

    Submitted 26 November, 2022; originally announced November 2022.

    Comments: 10 pages, Neural Information Processing Systems 2022

  28. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  29. arXiv:2209.15162  [pdf, other

    cs.CL cs.LG

    Linearly Mapping from Image to Text Space

    Authors: Jack Merullo, Louis Castricato, Carsten Eickhoff, Ellie Pavlick

    Abstract: The extent to which text-only language models (LMs) learn to represent features of the non-linguistic world is an open question. Prior work has shown that pretrained LMs can be taught to caption images when a vision model's parameters are optimized to encode images in the language space. We test a stronger hypothesis: that the conceptual representations learned by frozen text-only models and visio… ▽ More

    Submitted 9 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: Accepted at ICLR 2023

  30. arXiv:2208.10244  [pdf, other

    cs.CL cs.AI

    Unit Testing for Concepts in Neural Networks

    Authors: Charles Lovering, Ellie Pavlick

    Abstract: Many complex problems are naturally understood in terms of symbolic concepts. For example, our concept of "cat" is related to our concepts of "ears" and "whiskers" in a non-arbitrary way. Fodor (1998) proposes one theory of concepts, which emphasizes symbolic representations related via constituency structures. Whether neural networks are consistent with such a theory is open for debate. We propos… ▽ More

    Submitted 25 November, 2022; v1 submitted 28 July, 2022; originally announced August 2022.

    Comments: TACL, In Press. 12 Pages

  31. arXiv:2207.02272  [pdf, other

    cs.CL cs.AI

    Pretraining on Interactions for Learning Grounded Affordance Representations

    Authors: Jack Merullo, Dylan Ebert, Carsten Eickhoff, Ellie Pavlick

    Abstract: Lexical semantics and cognitive science point to affordances (i.e. the actions that objects support) as critical for understanding and representing nouns and verbs. However, study of these semantic features has not yet been integrated with the "foundation" models that currently dominate language representation research. We hypothesize that predictive modeling of object state over time will result… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: *SEM 2022

  32. arXiv:2206.11953  [pdf, other

    cs.CL cs.AI

    Do Trajectories Encode Verb Meaning?

    Authors: Dylan Ebert, Chen Sun, Ellie Pavlick

    Abstract: Distributional models learn representations of words from text, but are criticized for their lack of grounding, or the linking of text to the non-linguistic world. Grounded language models have had success in learning to connect concrete categories like nouns and adjectives to the world via images and videos, but can struggle to isolate the meaning of the verbs themselves from the context in which… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: NAACL 2022

  33. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  34. arXiv:2203.17271  [pdf, other

    cs.CV cs.AI

    Do Vision-Language Pretrained Models Learn Composable Primitive Concepts?

    Authors: Tian Yun, Usha Bhalla, Ellie Pavlick, Chen Sun

    Abstract: Vision-language (VL) pretrained models have achieved impressive performance on multimodal reasoning and zero-shot recognition tasks. Many of these VL models are pretrained on unlabeled image and caption pairs from the internet. In this paper, we study whether representations of primitive concepts--such as colors, shapes, or the attributes of object parts--emerge automatically within these pretrain… ▽ More

    Submitted 27 May, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Published in Transactions on Machine Learning Research (TMLR) 2023

  35. arXiv:2111.05940  [pdf, other

    cs.CL

    A Novel Corpus of Discourse Structure in Humans and Computers

    Authors: Babak Hemmatian, Sheridan Feucht, Rachel Avram, Alexander Wey, Muskaan Garg, Kate Spitalnic, Carsten Eickhoff, Ellie Pavlick, Bjorn Sandstede, Steven Sloman

    Abstract: We present a novel corpus of 445 human- and computer-generated documents, comprising about 27,000 clauses, annotated for semantic clause types and coherence relations that allow for nuanced comparison of artificial and natural discourse modes. The corpus covers both formal and informal discourse, and contains documents generated using fine-tuned GPT-2 (Zellers et al., 2019) and GPT-3(Brown et al.,… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Comments: In the 2nd Workshop on Computational Approaches to Discourse (CODI) at EMNLP 2021 (extended abstract). 3 pages

  36. arXiv:2109.10246  [pdf, other

    cs.CL cs.AI cs.CV

    Does Vision-and-Language Pretraining Improve Lexical Grounding?

    Authors: Tian Yun, Chen Sun, Ellie Pavlick

    Abstract: Linguistic representations derived from text alone have been criticized for their lack of grounding, i.e., connecting words to their meanings in the physical world. Vision-and-Language (VL) models, trained jointly on text and image or video data, have been offered as a response to such criticisms. However, while VL pretraining has shown success on multimodal tasks such as visual question answering… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: Camera ready for Findings of EMNLP 2021

  37. arXiv:2109.07020  [pdf, other

    cs.CL

    Frequency Effects on Syntactic Rule Learning in Transformers

    Authors: Jason Wei, Dan Garrette, Tal Linzen, Ellie Pavlick

    Abstract: Pre-trained language models perform well on a variety of linguistic tasks that require symbolic reasoning, raising the question of whether such models implicitly represent abstract symbols and rules. We investigate this question using the case study of BERT's performance on English subject-verb agreement. Unlike prior work, we train multiple instances of BERT from scratch, allowing us to perform a… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Camera ready for EMNLP 2021

  38. arXiv:2109.06129  [pdf, other

    cs.CV cs.CL

    Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color

    Authors: Mostafa Abdou, Artur Kulmizev, Daniel Hershcovich, Stella Frank, Ellie Pavlick, Anders Søgaard

    Abstract: Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases -- (Paris, Capital, France). However, simple relations of this type can often be recovered heuristically and the extent to which models implicitly reflect topological structure that is grounded in world, such as perceptual structure, is unknown. To expl… ▽ More

    Submitted 14 September, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: CoNLL 2021

  39. arXiv:2109.01247  [pdf, other

    cs.CL

    Do Prompt-Based Models Really Understand the Meaning of their Prompts?

    Authors: Albert Webson, Ellie Pavlick

    Abstract: Recently, a boom of papers has shown extraordinary progress in zero-shot and few-shot learning with various prompt-based models. It is commonly argued that prompts help models to learn faster in the same way that humans learn faster when provided with task instructions expressed in natural language. In this study, we experiment with over 30 prompt templates manually written for natural language in… ▽ More

    Submitted 21 April, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: NAACL 2022. Unabridged version. Code available at https://github.com/awebson/prompt_semantics

  40. arXiv:2106.16163  [pdf, other

    cs.CL

    The MultiBERTs: BERT Reproductions for Robustness Analysis

    Authors: Thibault Sellam, Steve Yadlowsky, Jason Wei, Naomi Saphra, Alexander D'Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, Ian Tenney, Ellie Pavlick

    Abstract: Experiments with pre-trained models such as BERT are often based on a single checkpoint. While the conclusions drawn apply to the artifact tested in the experiment (i.e., the particular instance of the model), it is not always clear whether they hold for the more general procedure which includes the architecture, training data, initialization scheme, and loss function. Recent work has shown that r… ▽ More

    Submitted 21 March, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: Accepted at ICLR'22. Checkpoints and example analyses: http://goo.gle/multiberts

  41. arXiv:2101.00391  [pdf, other

    cs.CL

    Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering

    Authors: Najoung Kim, Ellie Pavlick, Burcu Karagol Ayan, Deepak Ramachandran

    Abstract: Many Question-Answering (QA) datasets contain unanswerable questions, but their treatment in QA systems remains primitive. Our analysis of the Natural Questions (Kwiatkowski et al. 2019) dataset reveals that a substantial portion of unanswerable questions ($\sim$21%) can be explained based on the presence of unverifiable presuppositions. We discuss the shortcomings of current models in handling su… ▽ More

    Submitted 3 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: ACL 2021 Camera-ready

  42. arXiv:2012.02705  [pdf, other

    cs.RO cs.CL

    Spatial Language Understanding for Object Search in Partially Observed City-scale Environments

    Authors: Kaiyu Zheng, Deniz Bayazit, Rebecca Mathew, Ellie Pavlick, Stefanie Tellex

    Abstract: Humans use spatial language to naturally describe object locations and their relations. Interpreting spatial language not only adds a perceptual modality for robots, but also reduces the barrier of interfacing with humans. Previous work primarily considers spatial language as goal specification for instruction following tasks in fully observable domains, often paired with reference paths for rewar… ▽ More

    Submitted 31 July, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: 11 pages, 12 figures, 3 table; Added acknowledgements. 30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2021

  43. arXiv:2010.15225  [pdf, other

    cs.CL

    A Visuospatial Dataset for Naturalistic Verb Learning

    Authors: Dylan Ebert, Ellie Pavlick

    Abstract: We introduce a new dataset for training and evaluating grounded language models. Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access: That is, naturalistic, spontaneous speech paired with richly grounded visuospatial context. We use the collected data to compare several distributiona… ▽ More

    Submitted 28 October, 2020; originally announced October 2020.

    Comments: 9 pages, 3 figures, starsem 2020

    ACM Class: I.2.7

  44. arXiv:2010.06032  [pdf, other

    cs.CL

    Measuring and Reducing Gendered Correlations in Pre-trained Models

    Authors: Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, Slav Petrov

    Abstract: Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for… ▽ More

    Submitted 2 March, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

  45. arXiv:2010.04872  [pdf, other

    cs.CL

    Self-play for Data Efficient Language Acquisition

    Authors: Charles Lovering, Ellie Pavlick

    Abstract: When communicating, people behave consistently across conversational roles: People understand the words they say and are able to produce the words they hear. To date, artificial agents developed for language tasks have lacked such symmetry, meaning agents trained to produce language are unable to understand it and vice-versa. In this work, we exploit the symmetric nature of communication in order… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

  46. arXiv:2010.02976  [pdf, other

    cs.CL

    Are "Undocumented Workers" the Same as "Illegal Aliens"? Disentangling Denotation and Connotation in Vector Spaces

    Authors: Albert Webson, Zhizhong Chen, Carsten Eickhoff, Ellie Pavlick

    Abstract: In politics, neologisms are frequently invented for partisan objectives. For example, "undocumented workers" and "illegal aliens" refer to the same group of people (i.e., they have the same denotation), but they carry clearly different connotations. Examples like these have traditionally posed a challenge to reference-based semantic theories and led to increasing acceptance of alternative theories… ▽ More

    Submitted 25 October, 2020; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Published at EMNLP 2020. Recorded talk available at https://youtu.be/V2pdS6Y_8n0 . Code and data available at https://github.com/awebson/congressional_adversary

  47. arXiv:2006.13253  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    Robot Object Retrieval with Contextual Natural Language Queries

    Authors: Thao Nguyen, Nakul Gopalan, Roma Patel, Matt Corsaro, Ellie Pavlick, Stefanie Tellex

    Abstract: Natural language object retrieval is a highly useful yet challenging task for robots in human-centric environments. Previous work has primarily focused on commands specifying the desired object's type such as "scissors" and/or visual attributes such as "red," thus limiting the robot to only known object classes. We develop a model to retrieve objects based on descriptions of their usage. The model… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  48. arXiv:2004.15012  [pdf, other

    cs.CL

    Does Data Augmentation Improve Generalization in NLP?

    Authors: Rohan Jha, Charles Lovering, Ellie Pavlick

    Abstract: Neural models often exploit superficial features to achieve good performance, rather than deriving more general features. Overcoming this tendency is a central challenge in areas such as representation learning and ML fairness. Recent work has proposed using data augmentation, i.e., generating training examples where the superficial features fail, as a means of encouraging models to prefer the str… ▽ More

    Submitted 9 October, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

  49. arXiv:2004.14448  [pdf, other

    cs.CL

    What Happens To BERT Embeddings During Fine-tuning?

    Authors: Amil Merchant, Elahe Rahimtoroghi, Ellie Pavlick, Ian Tenney

    Abstract: While there has been much recent work studying how linguistic information is encoded in pre-trained sentence representations, comparatively little is understood about how these models change when adapted to solve downstream tasks. Using a suite of analysis techniques (probing classifiers, Representational Similarity Analysis, and model ablations), we investigate how fine-tuning affects the represe… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

    Comments: 9 pages (not including references), 5 figures

  50. arXiv:1905.12096  [pdf, other

    cs.RO cs.AI

    Planning with State Abstractions for Non-Markovian Task Specifications

    Authors: Yoonseon Oh, Roma Patel, Thao Nguyen, Baichuan Huang, Ellie Pavlick, Stefanie Tellex

    Abstract: Often times, we specify tasks for a robot using temporal language that can also span different levels of abstraction. The example command ``go to the kitchen before going to the second floor'' contains spatial abstraction, given that ``floor'' consists of individual rooms that can also be referred to in isolation ("kitchen", for example). There is also a temporal ordering of events, defined by the… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.