Skip to main content

Showing 1–15 of 15 results for author: Padmakumar, V

  1. arXiv:2311.08702  [pdf, other

    cs.AI cs.CL

    Debate Helps Supervise Unreliable Experts

    Authors: Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman

    Abstract: As AI systems are used to answer more difficult questions and potentially help create new knowledge, judging the truthfulness of their outputs becomes more difficult and more important. How can we supervise unreliable experts, which have access to the truth but may not accurately report it, to give answers that are systematically true and don't just superficially seem true, when the supervisor can… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 84 pages, 13 footnotes, 5 figures, 4 tables, 28 debate transcripts; data and code at https://github.com/julianmichael/debate/tree/2023-nyu-experiments

    ACM Class: I.2.0

  2. arXiv:2309.12570  [pdf, other

    cs.HC cs.AI cs.CL cs.CY

    Creativity Support in the Age of Large Language Models: An Empirical Study Involving Emerging Writers

    Authors: Tuhin Chakrabarty, Vishakh Padmakumar, Faeze Brahman, Smaranda Muresan

    Abstract: The development of large language models (LLMs) capable of following instructions and engaging in conversational interactions sparked increased interest in their utilization across various support tools. We investigate the utility of modern LLMs in assisting professional writers via an empirical user study (n=30). The design of our collaborative writing interface is grounded in the cognitive proce… ▽ More

    Submitted 30 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  3. arXiv:2309.05196  [pdf, other

    cs.CL cs.CY cs.HC cs.LG

    Does Writing with Language Models Reduce Content Diversity?

    Authors: Vishakh Padmakumar, He He

    Abstract: Large language models (LLMs) have led to a surge in collaborative writing with model assistance. As different users incorporate suggestions from the same model, there is a risk of decreased diversity in the produced content, potentially limiting diverse perspectives in public discourse. In this work, we measure the impact of co-writing on diversity via a controlled experiment, where users write ar… ▽ More

    Submitted 1 July, 2024; v1 submitted 10 September, 2023; originally announced September 2023.

    Comments: ICLR 2024

  4. arXiv:2305.15269  [pdf, other

    cs.CL cs.AI

    Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

    Authors: Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim, He He

    Abstract: Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, an… ▽ More

    Submitted 3 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Published as a conference paper at NeurIPS 2023

  5. arXiv:2305.14279  [pdf, other

    cs.CL

    Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs

    Authors: Angelica Chen, Jason Phang, Alicia Parrish, Vishakh Padmakumar, Chen Zhao, Samuel R. Bowman, Kyunghyun Cho

    Abstract: Large language models (LLMs) have achieved widespread success on a variety of in-context few-shot tasks, but this success is typically evaluated via correctness rather than consistency. We argue that self-consistency is an important criteria for valid multi-step reasoning in tasks where the solution is composed of the answers to multiple sub-steps. We propose two types of self-consistency that are… ▽ More

    Submitted 2 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to TMLR: https://openreview.net/forum?id=5nBqY1y96B

    Journal ref: Transactions on Machine Learning Research (2024)

  6. arXiv:2303.04562  [pdf, other

    cs.LG cs.CL q-bio.QM

    Extrapolative Controlled Sequence Generation via Iterative Refinement

    Authors: Vishakh Padmakumar, Richard Yuanzhe Pang, He He, Ankur P. Parikh

    Abstract: We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where the goal is to design novel proteins that are \textit{better} (e.g., more stable) than existing sequences. Thus, by definition, the target sequences and their att… ▽ More

    Submitted 7 June, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

    Comments: ICML 2023 - Camera Ready Version

  7. arXiv:2211.08714  [pdf, other

    cs.CL cs.AI cs.LG

    Reward Gaming in Conditional Text Generation

    Authors: Richard Yuanzhe Pang, Vishakh Padmakumar, Thibault Sellam, Ankur P. Parikh, He He

    Abstract: To align conditional text generation model outputs with desired behaviors, there has been an increasing focus on training the model using reinforcement learning (RL) with reward functions learned from human annotations. Under this framework, we identify three common cases where high rewards are incorrectly assigned to undesirable patterns: noise-induced spurious correlation, naturally occurring sp… ▽ More

    Submitted 1 June, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: ACL 2023

  8. arXiv:2210.13669  [pdf, other

    cs.CL

    Help me write a poem: Instruction Tuning as a Vehicle for Collaborative Poetry Writing

    Authors: Tuhin Chakrabarty, Vishakh Padmakumar, He He

    Abstract: Recent work in training large language models (LLMs) to follow natural language instructions has opened up exciting opportunities for natural language interface design. Building on the prior success of LLMs in the realm of computer-assisted creativity, we aim to study if LLMs can improve the quality of user-generated content through collaboration. We present CoPoet, a collaborative poetry writing… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: To appear at EMNLP 2022

  9. arXiv:2210.10860  [pdf, other

    cs.CL

    Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions

    Authors: Alicia Parrish, Harsh Trivedi, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Amanpreet Singh Saimbhi, Samuel R. Bowman

    Abstract: The use of language-model-based question-answering systems to aid humans in completing difficult tasks is limited, in part, by the unreliability of the text these systems generate. Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: 12 pages, 6 figures, 7 tables

  10. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  11. arXiv:2204.11117  [pdf, other

    cs.CL cs.LG

    Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning

    Authors: Vishakh Padmakumar, Leonard Lausen, Miguel Ballesteros, Sheng Zha, He He, George Karypis

    Abstract: Recent work has found that multi-task training with a large number of diverse tasks can uniformly improve downstream performance on unseen target tasks. In contrast, literature on task transferability has established that the choice of intermediate tasks can heavily affect downstream task performance. In this work, we aim to disentangle the effect of scale and relatedness of tasks in multi-task re… ▽ More

    Submitted 12 July, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

    Comments: NAACL 2022 - Camera ready version

  12. arXiv:2112.08608  [pdf, other

    cs.CL

    QuALITY: Question Answering with Long Input Texts, Yes!

    Authors: Richard Yuanzhe Pang, Alicia Parrish, Nitish Joshi, Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, Samuel R. Bowman

    Abstract: To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than rely… ▽ More

    Submitted 11 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  13. arXiv:2111.04193  [pdf, other

    cs.CL

    Machine-in-the-Loop Rewriting for Creative Image Captioning

    Authors: Vishakh Padmakumar, He He

    Abstract: Machine-in-the-loop writing aims to enable humans to collaborate with models to complete their writing tasks more effectively. Prior work has found that providing humans a machine-written draft or sentence-level continuations has limited success since the generated text tends to deviate from humans' intention. To allow the user to retain control over the content, we train a rewriting model that, w… ▽ More

    Submitted 8 May, 2022; v1 submitted 7 November, 2021; originally announced November 2021.

    Comments: To appear at NAACL 2022

  14. arXiv:2110.08193  [pdf, other

    cs.CL

    BBQ: A Hand-Built Bias Benchmark for Question Answering

    Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, Samuel R. Bowman

    Abstract: It is well documented that NLP models learn social biases, but little work has been done on how these biases manifest in model outputs for applied tasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), a dataset of question sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine social dimensions rele… ▽ More

    Submitted 15 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted to ACL 2022 Findings. 20 pages, 10 figures

  15. arXiv:2102.06272  [pdf, other

    cs.CL cs.LG

    Unsupervised Extractive Summarization using Pointwise Mutual Information

    Authors: Vishakh Padmakumar, He He

    Abstract: Unsupervised approaches to extractive summarization usually rely on a notion of sentence importance defined by the semantic similarity between a sentence and the document. We propose new metrics of relevance and redundancy using pointwise mutual information (PMI) between sentences, which can be easily computed by a pre-trained language model. Intuitively, a relevant sentence allows readers to infe… ▽ More

    Submitted 22 March, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: To appear at EACL 2021