Skip to main content

Showing 1–50 of 56 results for author: Sap, M

  1. arXiv:2407.07950  [pdf, other

    cs.CL cs.AI cs.HC

    Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance

    Authors: Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, Maarten Sap

    Abstract: The reconfiguration of human-LM interactions from simple sentence completions to complex, multi-domain, humanlike engagements necessitates new methodologies to understand how humans choose to rely on LMs. In our work, we contend that reliance is influenced by numerous factors within the interactional context of a generation, a departure from prior work that used verbalized confidence (e.g., "I'm c… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Preprint

  2. arXiv:2406.18510  [pdf, other

    cs.CL

    WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

    Authors: Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, Niloofar Mireshghallah, Ximing Lu, Maarten Sap, Yejin Choi, Nouha Dziri

    Abstract: We introduce WildTeaming, an automatic LLM safety red-teaming framework that mines in-the-wild user-chatbot interactions to discover 5.7K unique clusters of novel jailbreak tactics, and then composes multiple tactics for systematic exploration of novel jailbreaks. Compared to prior work that performed red-teaming via recruited human workers, gradient-based optimization, or iterative revision with… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2405.17633  [pdf, other

    cs.CL

    HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs

    Authors: Jocelyn Shen, Joel Mire, Hae Won Park, Cynthia Breazeal, Maarten Sap

    Abstract: Empathy serves as a cornerstone in enabling prosocial behaviors, and can be evoked through sharing of personal experiences in stories. While empathy is influenced by narrative content, intuitively, people respond to the way a story is told as well, through narrative style. Yet the relationship between empathy and narrative style is not fully understood. In this work, we empirically examine and qua… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2405.09373  [pdf, other

    cs.CL

    PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models

    Authors: Devansh Jain, Priyanshu Kumar, Samuel Gehman, Xuhui Zhou, Thomas Hartvigsen, Maarten Sap

    Abstract: Recent advances in large language models (LLMs) have led to their extensive global deployment, and ensuring their safety calls for comprehensive and multilingual toxicity evaluations. However, existing toxicity benchmarks are overwhelmingly focused on English, posing serious risks to deploying LLMs in other languages. We address this by introducing PolygloToxicityPrompts (PTP), the first large-sca… ▽ More

    Submitted 20 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  5. arXiv:2405.08760  [pdf, other

    cs.CL cs.AI

    Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs

    Authors: Akhila Yerukola, Saujas Vaduguru, Daniel Fried, Maarten Sap

    Abstract: Humans often express their communicative intents indirectly or non-literally, which requires their interlocutors -- human or AI -- to understand beyond the literal meaning of words. While most existing work has focused on discriminative evaluations, we present a new approach to generatively evaluate large language models' (LLMs') intention understanding by examining their responses to non-literal… ▽ More

    Submitted 19 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  6. arXiv:2404.12464  [pdf, other

    cs.CL

    NormAd: A Benchmark for Measuring the Cultural Adaptability of Large Language Models

    Authors: Abhinav Rao, Akhila Yerukola, Vishwa Shah, Katharina Reinecke, Maarten Sap

    Abstract: The integration of large language models (LLMs) into various global cultures fundamentally presents a challenge: LLMs must navigate interactions, respect social norms, and avoid transgressing cultural boundaries. However, it is still unclear if LLMs can adapt their outputs to diverse cultural norms. Our study focuses on this aspect. We introduce NormAd, a novel dataset, which includes 2.6k stories… ▽ More

    Submitted 11 July, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Preprint. In Review

  7. arXiv:2403.14791  [pdf, other

    cs.CY cs.AI

    Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits

    Authors: Jimin Mun, Liwei Jiang, Jenny Liang, Inyoung Cheong, Nicole DeCario, Yejin Choi, Tadayoshi Kohno, Maarten Sap

    Abstract: General purpose AI, such as ChatGPT, seems to have lowered the barriers for the public to use AI and harness its power. However, the governance and development of AI still remain in the hands of a few, and the pace of development is accelerating without proper assessment of risks. As a first step towards democratic governance and risk assessment of AI, we introduce Particip-AI, a framework to gath… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 35 pages, 4 figures, 23 tables

  8. arXiv:2403.08715  [pdf, other

    cs.CL

    SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents

    Authors: Ruiyi Wang, Haofei Yu, Wenxin Zhang, Zhengyang Qi, Maarten Sap, Graham Neubig, Yonatan Bisk, Hao Zhu

    Abstract: Humans learn social skills through both imitation and social interaction. This social learning process is largely understudied by existing research on building language agents. Motivated by this gap, we propose an interactive learning method, SOTOPIA-$π$, improving the social intelligence of language agents. This method leverages behavior cloning and self-reinforcement training on filtered social… ▽ More

    Submitted 25 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  9. arXiv:2403.05020  [pdf, other

    cs.CL cs.AI

    Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

    Authors: Xuhui Zhou, Zhe Su, Tiwalayo Eisape, Hyunwoo Kim, Maarten Sap

    Abstract: Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents… ▽ More

    Submitted 18 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  10. arXiv:2403.00179  [pdf, other

    cs.HC

    Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate

    Authors: Jimin Mun, Cathy Buerger, Jenny T. Liang, Joshua Garland, Maarten Sap

    Abstract: Counterspeech, i.e., direct responses against hate speech, has become an important tool to address the increasing amount of hate online while avoiding censorship. Although AI has been proposed to help scale up counterspeech efforts, this raises questions of how exactly AI could assist in this process, since counterspeech is a deeply empathetic and agentic process for those involved. In this work,… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: To appear in CHI 2024. 22 pages, 3 figures, 7 tables

  11. arXiv:2401.06730  [pdf, other

    cs.CL cs.AI cs.HC

    Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

    Authors: Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Maarten Sap

    Abstract: As natural language becomes the default interface for human-AI interaction, there is a need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are… ▽ More

    Submitted 9 July, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: ACL 2024 (Camera Ready)

  12. Riveter: Measuring Power and Social Dynamics Between Entities

    Authors: Maria Antoniak, Anjalie Field, Jimin Mun, Melanie Walsh, Lauren F. Klein, Maarten Sap

    Abstract: Riveter provides a complete easy-to-use pipeline for analyzing verb connotations associated with entities in text corpora. We prepopulate the package with connotation frames of sentiment, power, and agency, which have demonstrated usefulness for capturing social phenomena, such as gender bias, in a broad range of corpora. For decades, lexical frameworks have been foundational tools in computationa… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Journal ref: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Volume 3: System Demonstrations, 2023, pages 377-388

  13. arXiv:2311.09675  [pdf, other

    cs.CL

    Where Do People Tell Stories Online? Story Detection Across Online Communities

    Authors: Maria Antoniak, Joel Mire, Maarten Sap, Elliott Ash, Andrew Piper

    Abstract: Story detection in online communities is a challenging task as stories are scattered across communities and interwoven with non-storytelling spans within a single text. We address this challenge by building and releasing the StorySeeker toolkit, including a richly annotated dataset of 502 Reddit posts and comments, a detailed codebook adapted to the social media context, and models to predict stor… ▽ More

    Submitted 26 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  14. arXiv:2311.00161  [pdf, other

    cs.CL cs.AI

    Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language

    Authors: Jimin Mun, Emily Allaway, Akhila Yerukola, Laura Vianna, Sarah-Jane Leslie, Maarten Sap

    Abstract: Counterspeech, i.e., responses to counteract potential harms of hateful speech, has become an increasingly popular solution to address online hate speech without censorship. However, properly countering hateful language requires countering and dispelling the underlying inaccurate stereotypes implied by such language. In this work, we draw from psychology and philosophy literature to craft six psyc… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Findings, 19 pages

  15. arXiv:2310.17884  [pdf, other

    cs.AI cs.CL cs.CR

    Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory

    Authors: Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou, Yulia Tsvetkov, Maarten Sap, Reza Shokri, Yejin Choi

    Abstract: The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information from multiple sources in their inputs and are expected to reason about what to share in their outputs, for what purpose and with whom, within a given context. In this work, we draw attention to the highly critic… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: 2024 ICLR Spotlight. The dataset and code can be found at https://confaide.github.io

  16. arXiv:2310.15421  [pdf, other

    cs.CL cs.AI

    FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

    Authors: Hyunwoo Kim, Melanie Sclar, Xuhui Zhou, Ronan Le Bras, Gunhee Kim, Yejin Choi, Maarten Sap

    Abstract: Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating… ▽ More

    Submitted 31 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023. Code and dataset can be found here: https://hyunw.kim/fantom

  17. arXiv:2310.11667  [pdf, other

    cs.AI cs.CL cs.LG

    SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

    Authors: Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap

    Abstract: Humans are social beings; we pursue social goals in our daily interactions, which is a crucial aspect of social intelligence. Yet, AI systems' abilities in this realm remain elusive. We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence. In our environment, agents role-play and interact under a wide va… ▽ More

    Submitted 22 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Preprint, 43 pages. The first two authors contribute equally

  18. Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

    Authors: Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi

    Abstract: Human values are crucial to human decision-making. Value pluralism is the view that multiple correct values may be held in tension with one another (e.g., when considering lying to a friend to protect their feelings, how does one balance honesty with friendship?). As statistical learners, AI systems fit to averages by default, washing out these potentially irreducible value conflicts. To improve A… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Proceedings of the AAAI Conference on Artificial Intelligence, 38

    Journal ref: Vol. 38 No. 18: AAAI-24 Technical Tracks 18; 2024; 19937-19947

  19. arXiv:2306.01985  [pdf, other

    cs.CL

    COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements

    Authors: Xuhui Zhou, Hao Zhu, Akhila Yerukola, Thomas Davidson, Jena D. Hwang, Swabha Swayamdipta, Maarten Sap

    Abstract: Warning: This paper contains content that may be offensive or upsetting. Understanding the harms and offensiveness of statements requires reasoning about the social and situational context in which statements are made. For example, the utterance "your English is very good" may implicitly signal an insult when uttered by a white man to a non-white colleague, but uttered by an ESL teacher to their s… ▽ More

    Submitted 8 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Accepted to Findings of ACL 2023

  20. arXiv:2306.01943  [pdf, other

    cs.CL cs.CY cs.HC

    NLPositionality: Characterizing Design Biases of Datasets and Models

    Authors: Sebastin Santy, Jenny T. Liang, Ronan Le Bras, Katharina Reinecke, Maarten Sap

    Abstract: Design biases in NLP systems, such as performance differences for different populations, often stem from their creator's positionality, i.e., views and lived experiences shaped by identity and background. Despite the prevalence and risks of design biases, they are hard to quantify because researcher, system, and dataset positionality is often unobserved. We introduce NLPositionality, a framework f… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: ACL 2023

  21. arXiv:2305.17174  [pdf, other

    cs.CL cs.CY

    From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models

    Authors: Julia Mendelsohn, Ronan Le Bras, Yejin Choi, Maarten Sap

    Abstract: Dogwhistles are coded expressions that simultaneously convey one meaning to a broad audience and a second one, often hateful or provocative, to a narrow in-group; they are deployed to evade both political repercussions and algorithmic content moderation. For example, in the sentence 'we need to end the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to many, but secretly m… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023, see https://dogwhistles.allen.ai/ for the glossary and other materials

  22. arXiv:2305.14763  [pdf, other

    cs.CL

    Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models

    Authors: Natalie Shapira, Mosh Levy, Seyed Hossein Alavi, Xuhui Zhou, Yejin Choi, Yoav Goldberg, Maarten Sap, Vered Shwartz

    Abstract: The escalating debate on AI's capabilities warrants developing reliable metrics to assess machine "intelligence". Recently, many anecdotal examples were used to suggest that newer large language models (LLMs) like ChatGPT and GPT-4 exhibit Neural Theory-of-Mind (N-ToM); however, prior work reached conflicting conclusions regarding those abilities. We investigate the extent of LLMs' N-ToM through a… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  23. arXiv:2305.14755  [pdf, other

    cs.CL cs.AI

    Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting

    Authors: Akhila Yerukola, Xuhui Zhou, Elizabeth Clark, Maarten Sap

    Abstract: Most existing stylistic text rewriting methods and evaluation metrics operate on a sentence level, but ignoring the broader context of the text can lead to preferring generic, ambiguous, and incoherent rewrites. In this paper, we investigate integrating the preceding textual context into both the $\textit{rewriting}$ and $\textit{evaluation}$ stages of stylistic text rewriting, and introduce a new… ▽ More

    Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: emnlp 2023 main camera ready

  24. arXiv:2305.14718  [pdf, other

    cs.CL

    Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models

    Authors: Ashutosh Baheti, Ximing Lu, Faeze Brahman, Ronan Le Bras, Maarten Sap, Mark Riedl

    Abstract: Reinforcement Learning with Human Feedback (RLHF) is the most prominent method for Language Model (LM) alignment. However, RLHF is an unstable and data-hungry process that continually requires new high-quality LM-generated data for finetuning. We introduce Advantage-Leftover Lunch RL (A-LoL), a new class of offline policy gradient algorithms that enable RL training on any pre-existing data. By ass… ▽ More

    Submitted 19 April, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: published at ICLR 2024

  25. arXiv:2305.14246  [pdf, other

    cs.CL

    Modeling Empathic Similarity in Personal Narratives

    Authors: Jocelyn Shen, Maarten Sap, Pedro Colon-Hernandez, Hae Won Park, Cynthia Breazeal

    Abstract: The most meaningful connections between people are often fostered through expression of shared vulnerability and emotional experiences in personal narratives. We introduce a new task of identifying similarity in personal stories based on empathic resonance, i.e., the extent to which two people empathize with each others' experiences, as opposed to raw semantic or lexical similarity, as has predomi… ▽ More

    Submitted 6 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Published at EMNLP 2023

  26. arXiv:2305.13589  [pdf, other

    cs.CL

    BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases

    Authors: Yiming Zhang, Sravani Nanduri, Liwei Jiang, Tongshuang Wu, Maarten Sap

    Abstract: Toxicity annotators and content moderators often default to mental shortcuts when making decisions. This can lead to subtle toxicity being missed, and seemingly toxic but harmless content being over-detected. We introduce BiasX, a framework that enhances content moderation setups with free-text explanations of statements' implied social biases, and explore its effectiveness through a large-scale c… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  27. Queer In AI: A Case Study in Community-Led Participatory AI

    Authors: Organizers Of QueerInAI, :, Anaelia Ovalle, Arjun Subramonian, Ashwin Singh, Claas Voelcker, Danica J. Sutherland, Davide Locatelli, Eva Breznik, Filip Klubička, Hang Yuan, Hetvi J, Huan Zhang, Jaidev Shriram, Kruno Lehman, Luca Soldaini, Maarten Sap, Marc Peter Deisenroth, Maria Leonor Pacheco, Maria Ryskina, Martin Mundt, Milind Agarwal, Nyx McLean, Pan Xu, A Pranav , et al. (26 additional authors not shown)

    Abstract: We present Queer in AI as a case study for community-led participatory design in AI. We examine how participatory design and intersectional tenets started and shaped this community's programs over the years. We discuss different challenges that emerged in the process, look at ways this organization has fallen short of operationalizing participatory and intersectional principles, and then assess th… ▽ More

    Submitted 8 June, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: To appear at FAccT 2023

    Journal ref: 2023 ACM Conference on Fairness, Accountability, and Transparency

  28. arXiv:2303.16173  [pdf, other

    cs.CL

    Towards Countering Essentialism through Social Bias Reasoning

    Authors: Emily Allaway, Nina Taneja, Sarah-Jane Leslie, Maarten Sap

    Abstract: Essentialist beliefs (i.e., believing that members of the same group are fundamentally alike) play a central role in social stereotypes and can lead to harm when left unchallenged. In our work, we conduct exploratory studies into the task of countering essentialist beliefs (e.g., ``liberals are stupid''). Drawing on prior work from psychology and NLP, we construct five types of counterstatements a… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Workshop on NLP for Positive Impact @ EMNLP 2022

  29. arXiv:2212.10543  [pdf, other

    cs.CL cs.AI

    Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts

    Authors: Skyler Hallinan, Alisa Liu, Yejin Choi, Maarten Sap

    Abstract: Text detoxification has the potential to mitigate the harms of toxicity by rephrasing text to remove offensive meaning, but subtle toxicity remains challenging to tackle. We introduce MaRCo, a detoxification algorithm that combines controllable generation and text rewriting methods using a Product of Experts with autoencoder language models (LMs). MaRCo uses likelihoods under a non-toxic LM (exper… ▽ More

    Submitted 26 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  30. arXiv:2212.10465  [pdf, other

    cs.CL

    SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

    Authors: Hyunwoo Kim, Jack Hessel, Liwei Jiang, Peter West, Ximing Lu, Youngjae Yu, Pei Zhou, Ronan Le Bras, Malihe Alikhani, Gunhee Kim, Maarten Sap, Yejin Choi

    Abstract: Data scarcity has been a long standing issue in the field of open-domain social dialogue. To quench this thirst, we present SODA: the first publicly available, million-scale high-quality social dialogue dataset. By contextualizing social commonsense knowledge from a knowledge graph, we are able to distill an exceptionally broad spectrum of social interactions from a large language model. Human eva… ▽ More

    Submitted 23 October, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: EMNLP 2023. Dataset, model, and code can be found at https://hyunw.kim/sodaverse

  31. arXiv:2210.13312  [pdf, other

    cs.CL cs.AI

    Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs

    Authors: Maarten Sap, Ronan LeBras, Daniel Fried, Yejin Choi

    Abstract: Social intelligence and Theory of Mind (ToM), i.e., the ability to reason about the different mental states, intents, and reactions of all people involved, allow humans to effectively navigate and understand everyday social interactions. As NLP systems are used in increasingly complex social situations, their ability to grasp social dynamics becomes crucial. In this work, we examine the open quest… ▽ More

    Submitted 3 April, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: Originally published at EMNLP 2022, extended to include ChatGPT and GPT-4 models on March 30th 2023 (extension not peer reviewed)

  32. arXiv:2210.01478  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment

    Authors: Zhijing Jin, Sydney Levine, Fernando Gonzalez, Ojasv Kamal, Maarten Sap, Mrinmaya Sachan, Rada Mihalcea, Josh Tenenbaum, Bernhard Schölkopf

    Abstract: AI systems are becoming increasingly intertwined with human life. In order to effectively collaborate with humans and ensure safety, AI systems need to be able to understand, interpret and predict human moral judgments and decisions. Human moral judgments are often guided by rules, but not always. A central challenge for AI safety is capturing the flexibility of the human moral mind -- the ability… ▽ More

    Submitted 27 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022 Oral

  33. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  34. arXiv:2205.12688  [pdf, other

    cs.CL

    ProsocialDialog: A Prosocial Backbone for Conversational Agents

    Authors: Hyunwoo Kim, Youngjae Yu, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi, Maarten Sap

    Abstract: Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them. To address this issue, we introduce ProsocialDialog, the first large-scale multi-turn dialogue dataset to teach conversational agents to respond to problematic content following social norms. Covering diverse unethical, problematic, biased, and toxic sit… ▽ More

    Submitted 25 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022 camera ready; Dataset and model can be found at https://hyunw.kim/prosocial-dialog/

  35. arXiv:2205.01975  [pdf, other

    cs.CL cs.AI

    Aligning to Social Norms and Values in Interactive Narratives

    Authors: Prithviraj Ammanabrolu, Liwei Jiang, Maarten Sap, Hannaneh Hajishirzi, Yejin Choi

    Abstract: We focus on creating agents that act in alignment with socially beneficial norms and values in interactive narratives or text-based games -- environments wherein an agent perceives and interacts with a world through natural language. Such interactive agents are often trained via reinforcement learning to optimize task performance, even when such rewards may lead to agent behaviors that violate soc… ▽ More

    Submitted 4 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: In Proceedings of NAACL-2022

  36. arXiv:2203.09509  [pdf, other

    cs.CL

    ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

    Authors: Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, Ece Kamar

    Abstract: Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle with detecting implicitly toxic language. To help mitigate these issues, we create ToxiGen, a new large-scale and machine-generated dataset of 274k toxic and benign… ▽ More

    Submitted 14 July, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Published as a long paper at ACL 2022. Code: https://github.com/microsoft/TOXIGEN

  37. arXiv:2201.02662  [pdf, other

    cs.CL cs.AI

    Imagined versus Remembered Stories: Quantifying Differences in Narrative Flow

    Authors: Maarten Sap, Anna Jafarpour, Yejin Choi, Noah A. Smith, James W. Pennebaker, Eric Horvitz

    Abstract: Lifelong experiences and learned knowledge lead to shared expectations about how common situations tend to unfold. Such knowledge of narrative event flow enables people to weave together a story. However, comparable computational tools to evaluate the flow of events in narratives are limited. We quantify the differences between autobiographical and imagined stories by introducing sequentiality, a… ▽ More

    Submitted 8 July, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

    Comments: Equal contribution from Sap and Jafarpour; in review; version 2

  38. arXiv:2111.07997  [pdf, other

    cs.CL cs.HC

    Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

    Authors: Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. Smith

    Abstract: The perceived toxicity of language can vary based on someone's identity and beliefs, but this variation is often ignored when collecting toxic language datasets, resulting in dataset and model biases. We seek to understand the who, why, and what behind biases in toxicity annotations. In two online studies with demographically and politically diverse participants, we investigate the effect of annot… ▽ More

    Submitted 9 May, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: NAACL 2022 Camera Ready

  39. arXiv:2110.07574  [pdf, other

    cs.CL

    Can Machines Learn Morality? The Delphi Experiment

    Authors: Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jenny Liang, Jesse Dodge, Keisuke Sakaguchi, Maxwell Forbes, Jon Borchardt, Saadia Gabriel, Yulia Tsvetkov, Oren Etzioni, Maarten Sap, Regina Rini, Yejin Choi

    Abstract: As AI systems become increasingly powerful and pervasive, there are growing concerns about machines' morality or a lack thereof. Yet, teaching morality to machines is a formidable task, as morality remains among the most intensely debated questions in humanity, let alone for AI. Existing AI systems deployed to millions of users, however, are already making decisions loaded with moral implications,… ▽ More

    Submitted 12 July, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

  40. arXiv:2108.11830  [pdf, other

    cs.CL

    Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

    Authors: Ashutosh Baheti, Maarten Sap, Alan Ritter, Mark Riedl

    Abstract: Dialogue models trained on human conversations inadvertently learn to generate toxic responses. In addition to producing explicitly offensive utterances, these models can also implicitly insult a group or individual by aligning themselves with an offensive statement. To better understand the dynamics of contextually offensive language, we investigate the stance of dialogue model responses in offen… ▽ More

    Submitted 13 September, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

    Comments: Accepted at EMNLP 2021

  41. arXiv:2105.03023  [pdf, other

    cs.CL

    DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts

    Authors: Alisa Liu, Maarten Sap, Ximing Lu, Swabha Swayamdipta, Chandra Bhagavatula, Noah A. Smith, Yejin Choi

    Abstract: Despite recent advances in natural language generation, it remains challenging to control attributes of generated text. We propose DExperts: Decoding-time Experts, a decoding-time method for controlled text generation that combines a pretrained language model with "expert" LMs and/or "anti-expert" LMs in a product of experts. Intuitively, under the ensemble, tokens only get high probability if the… ▽ More

    Submitted 3 June, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: ACL 2021 camera-ready

  42. arXiv:2104.08790  [pdf, other

    cs.CL

    Misinfo Reaction Frames: Reasoning about Readers' Reactions to News Headlines

    Authors: Saadia Gabriel, Skyler Hallinan, Maarten Sap, Pemi Nguyen, Franziska Roesner, Eunsol Choi, Yejin Choi

    Abstract: Even to a simple and short news headline, readers react in a multitude of ways: cognitively (e.g. inferring the writer's intent), emotionally (e.g. feeling distrust), and behaviorally (e.g. sharing the news with their friends). Such reactions are instantaneous and yet complex, as they rely on factors that go beyond interpreting factual content of news. We propose Misinfo Reaction Frames (MRF), a p… ▽ More

    Submitted 22 March, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: ACL 2022 camera-ready

  43. arXiv:2104.08758  [pdf, other

    cs.CL cs.AI

    Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus

    Authors: Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner

    Abstract: Large language models have led to remarkable progress on many NLP tasks, and researchers are turning to ever-larger text corpora to train them. Some of the largest corpora available are made by scraping significant portions of the internet, and are frequently introduced with only minimal documentation. In this work we provide some of the first documentation for the Colossal Clean Crawled Corpus (C… ▽ More

    Submitted 30 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021 accepted paper camera ready version

  44. arXiv:2104.06390  [pdf, other

    cs.CL cs.LG

    Detoxifying Language Models Risks Marginalizing Minority Voices

    Authors: Albert Xu, Eshaan Pathak, Eric Wallace, Suchin Gururangan, Maarten Sap, Dan Klein

    Abstract: Language models (LMs) must be both safe and equitable to be responsibly deployed in practice. With safety in mind, numerous detoxification techniques (e.g., Dathathri et al. 2020; Krause et al. 2020) have been proposed to mitigate toxic LM generations. In this work, we show that current detoxification techniques hurt equity: they decrease the utility of LMs on language used by marginalized groups… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  45. arXiv:2102.00086  [pdf, other

    cs.CL

    Challenges in Automated Debiasing for Toxic Language Detection

    Authors: Xuhui Zhou, Maarten Sap, Swabha Swayamdipta, Noah A. Smith, Yejin Choi

    Abstract: Biased associations have been a challenge in the development of classifiers for detecting toxic language, hindering both fairness and accuracy. As potential solutions, we investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (s… ▽ More

    Submitted 29 January, 2021; originally announced February 2021.

    Comments: EACL 2021

  46. arXiv:2011.00620  [pdf, other

    cs.CL cs.AI

    Social Chemistry 101: Learning to Reason about Social and Moral Norms

    Authors: Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, Yejin Choi

    Abstract: Social norms -- the unspoken commonsense rules about acceptable social behavior -- are crucial in understanding the underlying causes and intents of people's actions in narratives. For example, underlying an action such as "wanting to call cops on my neighbors" are social norms that inform our conduct, such as "It is expected that you report crimes." We present Social Chemistry, a new conceptual… ▽ More

    Submitted 16 August, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Comments: Published at EMNLP 2020

  47. arXiv:2010.13816  [pdf, other

    cs.CL cs.AI

    PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction

    Authors: Xinyao Ma, Maarten Sap, Hannah Rashkin, Yejin Choi

    Abstract: Unconscious biases continue to be prevalent in modern text and media, calling for algorithms that can assist writers with bias correction. For example, a female character in a story is often portrayed as passive and powerless ("She daydreams about being a doctor") while a man is portrayed as more proactive and powerful ("He pursues his dream of being a doctor"). We formulate *Controllable Debias… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  48. arXiv:2009.11462  [pdf, other

    cs.CL

    RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

    Authors: Samuel Gehman, Suchin Gururangan, Maarten Sap, Yejin Choi, Noah A. Smith

    Abstract: Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration. We create and release RealToxicityPrompts, a dataset of 1… ▽ More

    Submitted 25 September, 2020; v1 submitted 23 September, 2020; originally announced September 2020.

    Comments: Findings in EMNLP 2020

  49. arXiv:1911.03891  [pdf, other

    cs.CL

    Social Bias Frames: Reasoning about Social and Power Implications of Language

    Authors: Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, Yejin Choi

    Abstract: Warning: this paper contains content that may be offensive or upsetting. Language has the power to reinforce stereotypes and project social biases onto others. At the core of the challenge is that it is rarely what is stated explicitly, but rather the implied meanings, that frame people's judgments about others. For example, given a statement that "we shouldn't lower our standards to hire more w… ▽ More

    Submitted 23 April, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: ACL 2020 Camera Ready; Data available at http://tinyurl.com/social-bias-frames

  50. arXiv:1906.05317  [pdf, other

    cs.CL cs.AI

    COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

    Authors: Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, Yejin Choi

    Abstract: We present the first comprehensive study on automatic knowledge base construction for two prevalent commonsense knowledge graphs: ATOMIC (Sap et al., 2019) and ConceptNet (Speer et al., 2017). Contrary to many conventional KBs that store knowledge with canonical templates, commonsense KBs only store loosely structured open-text descriptions of knowledge. We posit that an important step toward auto… ▽ More

    Submitted 14 June, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019