Skip to main content

Showing 1–13 of 13 results for author: Gerstenberg, T

  1. arXiv:2406.15917  [pdf, other

    cs.RO

    To Err is Robotic: Rapid Value-Based Trial-and-Error during Deployment

    Authors: Maximilian Du, Alexander Khazatsky, Tobias Gerstenberg, Chelsea Finn

    Abstract: When faced with a novel scenario, it can be hard to succeed on the first attempt. In these challenging situations, it is important to know how to retry quickly and meaningfully. Retrying behavior can emerge naturally in robots trained on diverse data, but such robot policies will typically only exhibit undirected retrying behavior and may not terminate a suboptimal approach before an unrecoverable… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  2. arXiv:2404.14313  [pdf, other

    cs.CL

    Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

    Authors: Jan-Philipp Fränken, Eric Zelikman, Rafael Rafailov, Kanishk Gandhi, Tobias Gerstenberg, Noah D. Goodman

    Abstract: When prompting a language model (LM), users often expect the model to adhere to a set of behavioral principles across diverse tasks, such as producing insightful content while avoiding harmful or biased language. Instilling such principles (i.e., a constitution) into a model is resource-intensive, technically challenging, and generally requires human preference labels or examples. We introduce SAM… ▽ More

    Submitted 21 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  3. arXiv:2404.10975  [pdf, other

    cs.CL

    Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models

    Authors: Jan-Philipp Fränken, Kanishk Gandhi, Tori Qiu, Ayesha Khawaja, Noah D. Goodman, Tobias Gerstenberg

    Abstract: As AI systems like language models are increasingly integrated into decision-making processes affecting people's lives, it's critical to ensure that these systems have sound moral reasoning. To test whether they do, we need to develop systematic evaluations. We provide a framework that uses a language model to translate causal graphs that capture key aspects of moral dilemmas into prompt templates… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: CogSci 2024

  4. arXiv:2403.19154  [pdf, other

    cs.CL cs.AI

    STaR-GATE: Teaching Language Models to Ask Clarifying Questions

    Authors: Chinmaya Andukuri, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

    Abstract: When prompting language models to complete a task, users often leave important aspects unsaid. While asking questions could resolve this ambiguity (GATE; Li et al., 2023), models often struggle to ask good questions. We explore a language model's ability to self-improve (STaR; Zelikman et al., 2022) by rewarding the model for generating useful questions-a simple method we dub STaR-GATE. We generat… ▽ More

    Submitted 29 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  5. arXiv:2310.19677  [pdf, other

    cs.CL

    MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks

    Authors: Allen Nie, Yuhui Zhang, Atharva Amdekar, Chris Piech, Tatsunori Hashimoto, Tobias Gerstenberg

    Abstract: Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people… ▽ More

    Submitted 31 October, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 34 pages, 7 figures. NeurIPS 2023

  6. arXiv:2310.17769  [pdf, other

    cs.CL cs.AI

    Social Contract AI: Aligning AI Assistants with Implicit Group Norms

    Authors: Jan-Philipp Fränken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D. Goodman

    Abstract: We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions. To validate our proposal, we run proof-of-concept simulations in the economic ultimatum game, formalizing user preferences as policies that guide the actions of simulated players. We find that the AI assistant accurately aligns its behavior to match standard policies fro… ▽ More

    Submitted 3 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: SoLaR NeurIPS 2023 Workshop (https://solar-neurips.github.io/)

  7. arXiv:2306.15448  [pdf, other

    cs.CL cs.AI cs.HC

    Understanding Social Reasoning in Language Models with Language Models

    Authors: Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

    Abstract: As Large Language Models (LLMs) become increasingly integrated into our everyday lives, understanding their ability to comprehend human mental states becomes critical for ensuring effective interactions. However, despite the recent attempts to assess the Theory-of-Mind (ToM) reasoning capabilities of LLMs, the degree to which these models can align with human ToM remains a nuanced topic of explora… ▽ More

    Submitted 4 December, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

  8. arXiv:2212.06823  [pdf, other

    cs.HC cs.AI

    Explanations Can Reduce Overreliance on AI Systems During Decision-Making

    Authors: Helena Vasconcelos, Matthew Jörke, Madeleine Grunde-McLaughlin, Tobias Gerstenberg, Michael Bernstein, Ranjay Krishna

    Abstract: Prior work has identified a resilient phenomenon that threatens the performance of human-AI decision-making teams: overreliance, when people agree with an AI, even when it is incorrect. Surprisingly, overreliance does not reduce when the AI produces explanations for its predictions, compared to only providing predictions. Some have argued that overreliance results from cognitive biases or uncalibr… ▽ More

    Submitted 26 January, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: CSCW 2023

  9. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  10. arXiv:2202.05983  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    Uncalibrated Models Can Improve Human-AI Collaboration

    Authors: Kailas Vodrahalli, Tobias Gerstenberg, James Zou

    Abstract: In many practical applications of AI, an AI model is used as a decision aid for human users. The AI provides advice that a human (sometimes) incorporates into their decision-making process. The AI advice is often presented with some measure of "confidence" that the human can use to calibrate how much they depend on or trust the advice. In this paper, we present an initial exploration that suggests… ▽ More

    Submitted 27 October, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: 21 pages, 12 figures, NeurIPS 2022

  11. arXiv:2107.07015  [pdf, other

    cs.AI cs.HC

    Do Humans Trust Advice More if it Comes from AI? An Analysis of Human-AI Interactions

    Authors: Kailas Vodrahalli, Roxana Daneshjou, Tobias Gerstenberg, James Zou

    Abstract: In decision support applications of AI, the AI algorithm's output is framed as a suggestion to a human user. The user may ignore this advice or take it into consideration to modify their decision. With the increasing prevalence of such human-AI interactions, it is important to understand how users react to AI advice. In this paper, we recruited over 1100 crowdworkers to characterize how humans use… ▽ More

    Submitted 1 June, 2022; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: Conference on Artificial Intelligence, Ethics, and Society (AIES 2022)

  12. arXiv:1905.04445  [pdf, other

    cs.AI cs.RO

    Explaining intuitive difficulty judgments by modeling physical effort and risk

    Authors: Ilker Yildirim, Basil Saeed, Grace Bennett-Pierre, Tobias Gerstenberg, Joshua Tenenbaum, Hyowon Gweon

    Abstract: The ability to estimate task difficulty is critical for many real-world decisions such as setting appropriate goals for ourselves or appreciating others' accomplishments. Here we give a computational account of how humans judge the difficulty of a range of physical construction tasks (e.g., moving 10 loose blocks from their initial configuration to their target configuration, such as a vertical to… ▽ More

    Submitted 14 May, 2019; v1 submitted 11 May, 2019; originally announced May 2019.

  13. arXiv:1707.08212  [pdf, other

    cs.AI cs.RO stat.ML

    Physical problem solving: Joint planning with symbolic, geometric, and dynamic constraints

    Authors: Ilker Yildirim, Tobias Gerstenberg, Basil Saeed, Marc Toussaint, Josh Tenenbaum

    Abstract: In this paper, we present a new task that investigates how people interact with and make judgments about towers of blocks. In Experiment~1, participants in the lab solved a series of problems in which they had to re-configure three blocks from an initial to a final configuration. We recorded whether they used one hand or two hands to do so. In Experiment~2, we asked participants online to judge wh… ▽ More

    Submitted 25 July, 2017; originally announced July 2017.