Skip to main content

Showing 1–21 of 21 results for author: Nie, A

  1. arXiv:2407.09975  [pdf, other

    cs.CY cs.AI cs.CL stat.AP

    The GPT Surprise: Offering Large Language Model Chat in a Massive Coding Class Reduced Engagement but Increased Adopters Exam Performances

    Authors: Allen Nie, Yash Chandak, Miroslav Suzara, Malika Ali, Juliette Woodrow, Matt Peng, Mehran Sahami, Emma Brunskill, Chris Piech

    Abstract: Large language models (LLMs) are quickly being adopted in a wide range of learning experiences, especially via ubiquitous and broadly accessible chat interfaces like ChatGPT and Copilot. This type of interface is readily available to students and teachers around the world, yet relatively little research has been done to assess the impact of such generic tools on student learning. Coding education… ▽ More

    Submitted 25 April, 2024; originally announced July 2024.

    Comments: 32 pages

  2. arXiv:2406.16218  [pdf, other

    cs.AI cs.LG

    Trace is the New AutoDiff -- Unlocking Efficient Optimization of Computational Workflows

    Authors: Ching-An Cheng, Allen Nie, Adith Swaminathan

    Abstract: We study a class of optimization problems motivated by automating the design and update of AI systems like coding assistants, robots, and copilots. We propose an end-to-end optimization framework, Trace, which treats the computational workflow of an AI system as a graph akin to neural networks, based on a generalization of back-propagation. Optimization of computational workflows often involves ri… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  3. arXiv:2405.17708  [pdf, other

    cs.LG cs.AI stat.ML

    OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

    Authors: Allen Nie, Yash Chandak, Christina J. Yuan, Anirudhan Badrinath, Yannis Flet-Berliac, Emma Brunskil

    Abstract: Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been pro… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 22 pages

  4. arXiv:2405.16434  [pdf, other

    cs.AI cs.CL cs.NE

    The Importance of Directional Feedback for LLM-based Optimizers

    Authors: Allen Nie, Ching-An Cheng, Andrey Kolobov, Adith Swaminathan

    Abstract: We study the potential of using large language models (LLMs) as an interactive optimizer for solving maximization problems in a text space using natural language and numerical feedback. Inspired by the classical optimization literature, we classify the natural language feedback into directional and non-directional, where the former is a generalization of the first-order feedback to the natural lan… ▽ More

    Submitted 20 June, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted and Presented at Foundation Models for Decision Making at NeurIPS 2023 (December 15, 2023). Work completed from June 2023 to September 2023

  5. arXiv:2312.06853  [pdf, other

    cs.AI

    LLF-Bench: Benchmark for Interactive Learning from Language Feedback

    Authors: Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan

    Abstract: We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions. Learning from language feedback (LLF) is essential for people, largely because the rich information this feedback provides can help a learner avoid much of trial and error and the… ▽ More

    Submitted 13 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

  6. arXiv:2310.19677  [pdf, other

    cs.CL

    MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks

    Authors: Allen Nie, Yuhui Zhang, Atharva Amdekar, Chris Piech, Tatsunori Hashimoto, Tobias Gerstenberg

    Abstract: Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people… ▽ More

    Submitted 31 October, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 34 pages, 7 figures. NeurIPS 2023

  7. arXiv:2306.14069  [pdf, other

    cs.LG

    Waypoint Transformer: Reinforcement Learning via Supervised Learning with Intermediate Targets

    Authors: Anirudhan Badrinath, Yannis Flet-Berliac, Allen Nie, Emma Brunskill

    Abstract: Despite the recent advancements in offline reinforcement learning via supervised learning (RvS) and the success of the decision transformer (DT) architecture in various domains, DTs have fallen short in several challenging benchmarks. The root cause of this underperformance lies in their inability to seamlessly connect segments of suboptimal trajectories. To overcome this limitation, we present a… ▽ More

    Submitted 18 November, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

    Comments: Accepted to the Conference on Neural Information Processing Systems 2023 (NeurIPS 2023)

  8. arXiv:2304.04933  [pdf, other

    cs.AI cs.CL

    Reinforcement Learning Tutor Better Supported Lower Performers in a Math Task

    Authors: Sherry Ruan, Allen Nie, William Steenbergen, Jiayu He, JQ Zhang, Meng Guo, Yao Liu, Kyle Dang Nguyen, Catherine Y Wang, Rui Ying, James A Landay, Emma Brunskill

    Abstract: Resource limitations make it hard to provide all students with one of the most effective educational interventions: personalized instruction. Reinforcement learning could be a key tool to reduce the development cost and improve the effectiveness of intelligent tutoring software that aims to provide the right support, at the right time, to a student. Here we illustrate that deep reinforcement learn… ▽ More

    Submitted 13 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: 23 pages. Under review

  9. arXiv:2301.11426  [pdf, other

    cs.LG

    Model-based Offline Reinforcement Learning with Local Misspecification

    Authors: Kefan Dong, Yannis Flet-Berliac, Allen Nie, Emma Brunskill

    Abstract: We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy selection. Theoretically, we prove a novel safe policy improvement theorem by establishing pessimism approximations to the value function. Our key insight is to join… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted by AAAI-23

  10. arXiv:2211.08802  [pdf, other

    cs.LG cs.AI stat.ML

    Giving Feedback on Interactive Student Programs with Meta-Exploration

    Authors: Evan Zheran Liu, Moritz Stephan, Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn

    Abstract: Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science. However, teaching and giving feedback on such software is time-consuming -- standard approaches require instructors to manually grade student-implemented interactive programs. As a result, online platforms that serve millions, like Code.org, are unable to provide any feedback on as… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: Advances in Neural Information Processing Systems (NeurIPS 2022). Selected as Oral

  11. arXiv:2210.08642  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data

    Authors: Allen Nie, Yannis Flet-Berliac, Deon R. Jordan, William Steenbergen, Emma Brunskill

    Abstract: Offline reinforcement learning (RL) can be used to improve future performance by leveraging historical data. There exist many different algorithms for offline RL, and it is well recognized that these algorithms, and their hyperparameter settings, can lead to decision policies with substantially differing performance. This prompts the need for pipelines that allow practitioners to systematically pe… ▽ More

    Submitted 12 January, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

    Comments: 32 pages. Published at NeurIPS 2022. Presented at RLDM 2022

  12. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  13. arXiv:2110.14615  [pdf, other

    cs.AI cs.CY cs.LG

    Play to Grade: Testing Coding Games as Classifying Markov Decision Process

    Authors: Allen Nie, Emma Brunskill, Chris Piech

    Abstract: Contemporary coding education often presents students with the task of developing programs that have user interaction and complex dynamic systems, such as mouse based games. While pedagogically compelling, there are no contemporary autonomous methods for providing feedback. Notably, interactive programs are impossible to grade by traditional unit tests. In this paper we formalize the challenge of… ▽ More

    Submitted 14 December, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021, 16 pages, 7 figures

  14. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  15. arXiv:2004.14451  [pdf, other

    cs.CL cs.CV

    Pragmatic Issue-Sensitive Image Captioning

    Authors: Allen Nie, Reuben Cohn-Gordon, Christopher Potts

    Abstract: Image captioning systems have recently improved dramatically, but they still tend to produce captions that are insensitive to the communicative goals that captions should meet. To address this, we propose Issue-Sensitive Image Captioning (ISIC). In ISIC, a captioning system is given a target image and an issue, which is a set of images partitioned in a way that specifies what information is releva… ▽ More

    Submitted 5 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: 15 pages, 7 figures. EMNLP 2020 Findings Accepted

  16. arXiv:1909.10699  [pdf, other

    cs.CL cs.IR cs.LG

    LitGen: Genetic Literature Recommendation Guided by Human Explanations

    Authors: Allen Nie, Arturo L. Pineda, Matt W. Wright Hannah Wand, Bryan Wulf, Helio A. Costa, Ronak Y. Patel, Carlos D. Bustamante, James Zou

    Abstract: As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathog… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: 12 pages; 5 figures. Accepted by PSB 2020 (Pacific Symposium on Biocomputing) track: Artificial Intelligence for Enhancing Clinical Medicine

  17. arXiv:1906.01243  [pdf, other

    cs.CL

    Learning to Explain: Answering Why-Questions via Rephrasing

    Authors: Allen Nie, Erin D. Bennett, Noah D. Goodman

    Abstract: Providing plausible responses to why questions is a challenging but critical goal for language based human-machine interaction. Explanations are challenging in that they require many different forms of abstract knowledge and reasoning. Previous work has either relied on human-curated structured knowledge bases or detailed domain representation to generate satisfactory explanations. They are also o… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: 8 pages, 5 figures. 1st ConvAI Workshop at ACL 2019

  18. arXiv:1811.11958  [pdf, other

    cs.CL

    Large-scale Generative Modeling to Improve Automated Veterinary Disease Coding

    Authors: Yuhui Zhang, Allen Nie, James Zou

    Abstract: Supervised learning is limited both by the quantity and quality of the labeled data. In the field of medical record tagging, writing styles between hospitals vary drastically. The knowledge learned from one hospital might not transfer well to another. This problem is amplified in veterinary medicine domain because veterinary clinics rarely apply medical codes to their records. We proposed and trai… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/83

  19. arXiv:1806.10722  [pdf, other

    cs.CL

    DeepTag: inferring all-cause diagnoses from clinical notes in under-resourced medical domain

    Authors: Allen Nie, Ashley Zehnder, Rodney L. Page, Arturo L. Pineda, Manuel A. Rivas, Carlos D. Bustamante, James Zou

    Abstract: Large scale veterinary clinical records can become a powerful resource for patient care and research. However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free text notes. The lack of standard coding makes it challenging to use the clinical data to improve patient care. It is also a major impedi… ▽ More

    Submitted 3 September, 2018; v1 submitted 27 June, 2018; originally announced June 2018.

    Comments: 17 pages, 6 figures. Updated the text for clarity

  20. arXiv:1710.04334  [pdf, other

    cs.CL cs.AI

    DisSent: Sentence Representation Learning from Explicit Discourse Relations

    Authors: Allen Nie, Erin D. Bennett, Noah D. Goodman

    Abstract: Learning effective representations of sentences is one of the core missions of natural language understanding. Existing models either train on a vast amount of text, or require costly, manually curated sentence relation datasets. We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations. We show that… ▽ More

    Submitted 4 June, 2019; v1 submitted 11 October, 2017; originally announced October 2017.

    Comments: 13 pages, 4 figures. ACL 2019

  21. arXiv:1703.02573  [pdf, other

    cs.LG cs.CL

    Data Noising as Smoothing in Neural Network Language Models

    Authors: Ziang Xie, Sida I. Wang, Jiwei Li, Daniel Lévy, Aiming Nie, Dan Jurafsky, Andrew Y. Ng

    Abstract: Data noising is an effective technique for regularizing neural network models. While noising is widely adopted in application domains such as vision and speech, commonly used noising primitives have not been developed for discrete sequence-level settings such as language modeling. In this paper, we derive a connection between input noising in neural network language models and smoothing in $n$-gra… ▽ More

    Submitted 7 March, 2017; originally announced March 2017.

    Comments: ICLR 2017