Skip to main content

Showing 1–50 of 168 results for author: Peng, N

  1. arXiv:2407.13248  [pdf, other

    cs.CL

    Are Large Language Models Capable of Generating Human-Level Narratives?

    Authors: Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng

    Abstract: This paper investigates the capability of LLMs in storytelling, focusing on narrative development and plot progression. We introduce a novel computational framework to analyze narratives through three discourse-level aspects: i) story arcs, ii) turning points, and iii) affective dimensions, including arousal and valence. By leveraging expert and automatic annotations, we uncover significant discre… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.02511  [pdf, other

    cs.RO cs.AI cs.CL

    LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning

    Authors: Silin Meng, Yiwei Wang, Cheng-Fu Yang, Nanyun Peng, Kai-Wei Chang

    Abstract: Path planning is a fundamental scientific problem in robotics and autonomous navigation, requiring the derivation of efficient routes from starting to destination points while avoiding obstacles. Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows. Conversely, large langua… ▽ More

    Submitted 19 June, 2024; originally announced July 2024.

    Comments: Submitted to The 2024 Conference on Empirical Methods in Natural Language Processing

  3. arXiv:2407.00219  [pdf, other

    cs.CL cs.AI

    Evaluating Human Alignment and Model Faithfulness of LLM Rationale

    Authors: Mohsen Fayyaz, Fan Yin, Jiao Sun, Nanyun Peng

    Abstract: We study how well large language models (LLMs) explain their generations with rationales -- a set of tokens extracted from the input texts that reflect the decision process of LLMs. We examine LLM rationales extracted with two methods: 1) attribution-based methods that use attention or gradients to locate important tokens, and 2) prompting-based methods that guide LLMs to extract rationales using… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  4. arXiv:2406.13892  [pdf, other

    cs.CL

    Adaptable Logical Control for Large Language Models

    Authors: Honghua Zhang, Po-Nien Kung, Masahiro Yoshida, Guy Van den Broeck, Nanyun Peng

    Abstract: Despite the success of Large Language Models (LLMs) on various tasks following human instructions, controlling model generation at inference time poses a persistent challenge. In this paper, we introduce Ctrl-G, an adaptable framework that facilitates tractable and flexible control of LLM generation to reliably follow logical constraints. Ctrl-G combines any production-ready LLM with a Hidden Mark… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.13692  [pdf, other

    cs.CL

    Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation

    Authors: Di Wu, Jia-Chen Gu, Fan Yin, Nanyun Peng, Kai-Wei Chang

    Abstract: Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decodin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.13444  [pdf, other

    cs.CL cs.CV

    VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

    Authors: Xueqing Wu, Zongyu Lin, Songyan Zhao, Te-Lin Wu, Pan Lu, Nanyun Peng, Kai-Wei Chang

    Abstract: Visual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems. However, these programs are prone to logic errors, with our preliminary evaluation showing that 58% of the total errors are caused by program logic errors. Debug… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: update reference

  7. arXiv:2406.12680  [pdf, other

    cs.CL

    Measuring Psychological Depth in Language Models

    Authors: Fabrice Harel-Canada, Hanyu Zhou, Sreya Mupalla, Zeynep Yildiz, Amit Sahai, Nanyun Peng

    Abstract: Evaluations of creative stories generated by large language models (LLMs) often focus on objective properties of the text, such as its style, coherence, and toxicity. While these metrics are indispensable, they do not speak to a story's subjective, psychological impact from a reader's perspective. We introduce the Psychological Depth Scale (PDS), a novel framework rooted in literary theory that me… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Preprint. Under Review

  8. arXiv:2406.07735  [pdf, other

    cs.CL cs.LG

    REAL Sampling: Boosting Factuality and Diversity of Open-Ended Generation via Asymptotic Entropy

    Authors: Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna, Tagyoung Chung

    Abstract: Decoding methods for large language models (LLMs) usually struggle with the tradeoff between ensuring factuality and maintaining diversity. For example, a higher p threshold in the nucleus (top-p) sampling increases the diversity but decreases the factuality, and vice versa. In this paper, we propose REAL (Residual Entropy from Asymptotic Line) sampling, a decoding method that achieves improved fa… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  9. arXiv:2406.05365  [pdf, other

    cs.CL cs.AI cs.LG

    CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

    Authors: I-Hung Hsu, Zifeng Wang, Long T. Le, Lesly Miculicich, Nanyun Peng, Chen-Yu Lee, Tomas Pfister

    Abstract: Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response… ▽ More

    Submitted 24 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Camera Ready Version

  10. arXiv:2406.02376  [pdf, other

    cs.CL

    Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

    Authors: Zhiwei Cao, Qian Cao, Yu Lu, Ningxin Peng, Luyang Huang, Shanbo Cheng, Jinsong Su

    Abstract: The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even falling to the closed-book level. This decline can be attributed to the loss of key information during the compression process. Our preliminary study supports t… ▽ More

    Submitted 17 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  11. arXiv:2406.01495  [pdf, other

    cs.CL

    Re-ReST: Reflection-Reinforced Self-Training for Language Agents

    Authors: Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, Nanyun Peng

    Abstract: Finetuning language agents with reasoning-action trajectories is effective, but obtaining these trajectories from human annotations or stronger models is costly and sometimes impractical. In this paper, we investigate the use of self-training in language agents, which can generate supervision from the agent itself, offering a promising alternative without relying on human or stronger model demonst… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  12. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  13. arXiv:2405.04834  [pdf, other

    cs.CV

    FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

    Authors: Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A Sigurdsson, Nanyun Peng, Xin Eric Wang

    Abstract: Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps. Nevertheless, current controllable T2I methods commonly face challenges related to efficiency and faithfulness, especially when conditioning on multiple inputs from either the same or diverse modalities. In this paper, we propose a novel Flexibl… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  14. arXiv:2404.17779  [pdf, other

    cs.CL

    Medical Vision-Language Pre-Training for Brain Abnormalities

    Authors: Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang

    Abstract: Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities. In the context of multimodal clinical AI, there is a growing need for models that possess domain-specific knowledge, as existing models often lack the expertise required for medical applications. In this paper, we take b… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  15. arXiv:2404.16792  [pdf, other

    cs.LG cs.AI cs.CL

    Weak-to-Strong Extrapolation Expedites Alignment

    Authors: Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng

    Abstract: The open-source community is experiencing a surge in the release of large language models (LLMs) that are trained to follow instructions and align with human preference. However, further training to improve them still requires expensive computational resources and data annotations. Is it possible to bypass additional training and cost-effectively acquire better-aligned models? Inspired by the lite… ▽ More

    Submitted 22 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Add theoretical explanation and more evaluation results

  16. arXiv:2404.13874  [pdf, other

    cs.CL cs.CV

    VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

    Authors: Haoyi Qiu, Wenbo Hu, Zi-Yi Dou, Nanyun Peng

    Abstract: Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs, undermining their reliability. A comprehensive quantitative evaluation is necessary to identify and understand the extent of hallucinations in these models. However, existing benchmarks are often limited in scope, focusing mainly on object hallucina… ▽ More

    Submitted 14 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: ACL 2024 Findings

  17. arXiv:2404.04763  [pdf, other

    cs.CV cs.AI

    GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling

    Authors: Hritik Bansal, Po-Nien Kung, P. Jeffrey Brantingham, Kai-Wei Chang, Nanyun Peng

    Abstract: Multimodal event argument role labeling (EARL), a task that assigns a role for each event participant (object) in an image is a complex challenge. It requires reasoning over the entire image, the depicted event, and the interactions between various objects participating in the event. Existing models heavily rely on high-quality event-annotated training data to understand the event semantics and st… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 20 pages, 15 Figures, 13 figures

  18. arXiv:2404.02456  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    PhonologyBench: Evaluating Phonological Skills of Large Language Models

    Authors: Ashima Suvarna, Harshita Khandelwal, Nanyun Peng

    Abstract: Phonology, the study of speech's structure and pronunciation rules, is a critical yet often overlooked component in Large Language Model (LLM) research. LLMs are widely used in various downstream applications that leverage phonology such as educational tools and poetry generation. Moreover, LLMs can potentially learn imperfect associations between orthographic and phonological forms from the train… ▽ More

    Submitted 5 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 17 pages, 7 figures, 6 tables

  19. arXiv:2404.01679  [pdf, other

    cs.CL cs.SI physics.soc-ph

    Event Detection from Social Media for Epidemic Prediction

    Authors: Tanmay Parekh, Anh Mac, Jiarui Yu, Yuxuan Dong, Syed Shahriar, Bonnie Liu, Eric Yang, Kuan-Hao Huang, Wei Wang, Nanyun Peng, Kai-Wei Chang

    Abstract: Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by de… ▽ More

    Submitted 24 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024

  20. arXiv:2404.00530  [pdf, other

    cs.CL cs.AI cs.LG

    Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization

    Authors: Hritik Bansal, Ashima Suvarna, Gantavya Bhatt, Nanyun Peng, Kai-Wei Chang, Aditya Grover

    Abstract: A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context. This only leverages the pairwise comparisons when the generations are placed in an identical context. However, such conditional rankings often fail to capture the complex and multidimensional aspects of human preferences. In this work,… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 25 pages, 14 figures, 5 tables

  21. arXiv:2403.15097  [pdf, other

    cs.CL cs.AI

    Argument-Aware Approach To Event Linking

    Authors: I-Hung Hsu, Zihan Xue, Nilay Pochh, Sahil Bansal, Premkumar Natarajan, Jayanth Srinivasa, Nanyun Peng

    Abstract: Event linking connects event mentions in text with relevant nodes in a knowledge base (KB). Prior research in event linking has mainly borrowed methods from entity linking, overlooking the distinct features of events. Compared to the extensively explored entity linking task, events have more complex structures and can be more effectively distinguished by examining their associated arguments. Moreo… ▽ More

    Submitted 6 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Paper accepted by ACL-findings 2024

  22. arXiv:2403.04656  [pdf, other

    cs.CL

    Chain of Thought Explanation for Dialogue State Tracking

    Authors: Lin Xu, Ningxin Peng, Daquan Zhou, See-Kiong Ng, Jinlan Fu

    Abstract: Dialogue state tracking (DST) aims to record user queries and goals during a conversational interaction achieved by maintaining a predefined set of slots and their corresponding values. Current approaches decide slot values opaquely, while humans usually adopt a more deliberate approach by collecting information from relevant dialogue turns and then reasoning the appropriate values. In this work,… ▽ More

    Submitted 9 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  23. arXiv:2403.02586  [pdf, other

    cs.CL

    Improving Event Definition Following For Zero-Shot Event Detection

    Authors: Zefan Cai, Po-Nien Kung, Ashima Suvarna, Mingyu Derek Ma, Hritik Bansal, Baobao Chang, P. Jeffrey Brantingham, Wei Wang, Nanyun Peng

    Abstract: Existing approaches on zero-shot event detection usually train models on datasets annotated with known event types, and prompt them with unseen event definitions. These approaches yield sporadic successes, yet generally fall short of expectations. In this work, we aim to improve zero-shot event detection by training models to better follow event definitions. We hypothesize that a diverse set of ev… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  24. arXiv:2403.02528  [pdf, other

    cs.CL cs.AI

    DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation

    Authors: Xueqing Wu, Rui Zheng, Jingzhen Sha, Te-Lin Wu, Hanyu Zhou, Mohan Tang, Kai-Wei Chang, Nanyun Peng, Haoran Huang

    Abstract: Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights to comprehensively answer a given user query for tabular data. In this work, we aim to propose new resources and benchmarks to inspire future research on this crucial yet challenging and under-explored task. However, collecting data analysis annotations curated by experts can be prohibitively expensi… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  25. arXiv:2401.18018  [pdf, other

    cs.LG cs.AI cs.CL

    On Prompt-Driven Safeguarding for Large Language Models

    Authors: Chujie Zheng, Fan Yin, Hao Zhou, Fandong Meng, Jie Zhou, Kai-Wei Chang, Minlie Huang, Nanyun Peng

    Abstract: Prepending model inputs with safety prompts is a common practice for safeguarding large language models (LLMs) against queries with harmful intents. However, the underlying working mechanisms of safety prompts have not been unraveled yet, restricting the possibility of automatically optimizing them to improve LLM safety. In this work, we investigate how LLMs' behavior (i.e., complying with or refu… ▽ More

    Submitted 3 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: ICML 2024

  26. arXiv:2401.13311  [pdf, other

    cs.CV cs.AI cs.LG

    ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models

    Authors: Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang, Nanyun Peng

    Abstract: Many real-world tasks require an agent to reason jointly over text and visual objects, (e.g., navigating in public spaces), which we refer to as context-sensitive text-rich visual reasoning. Specifically, these tasks require an understanding of the context in which the text interacts with visual elements within an image. However, there is a lack of existing datasets to benchmark the state-of-the-a… ▽ More

    Submitted 15 July, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  27. arXiv:2401.10471  [pdf, other

    cs.CL cs.AI

    DeepEdit: Knowledge Editing as Decoding with Constraints

    Authors: Yiwei Wang, Muhao Chen, Nanyun Peng, Kai-Wei Chang

    Abstract: How to edit the knowledge in multi-step reasoning has become the major challenge in the knowledge editing (KE) of large language models (LLMs). The difficulty arises because the hallucinations of LLMs during multi-step reasoning often lead to incorrect use of new knowledge and incorrect answers. To address this issue, we design decoding constraints to "regulate" LLMs' reasoning, enhancing logical… ▽ More

    Submitted 19 June, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  28. arXiv:2401.04700  [pdf, other

    cs.CL

    Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue

    Authors: Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng

    Abstract: Model editing is a technique that edits the large language models (LLMs) with updated knowledge to alleviate hallucinations without resource-intensive retraining. While current model editing methods can effectively modify a model's behavior within a specific area of interest, they often overlook the potential unintended side effects on the general abilities of LLMs such as reasoning, natural langu… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Propose a new regularization method

  29. arXiv:2401.00763  [pdf, other

    cs.SE cs.AI cs.CL cs.CV cs.MM

    New Job, New Gender? Measuring the Social Bias in Image Generation Models

    Authors: Wenxuan Wang, Haonan Bai, Jen-tse Huang, Yuxuan Wan, Youliang Yuan, Haoyi Qiu, Nanyun Peng, Michael R. Lyu

    Abstract: Image generation models can generate or edit images from a given text. Recent advancements in image generation technology, exemplified by DALL-E and Midjourney, have been groundbreaking. These advanced models, despite their impressive capabilities, are often trained on massive Internet datasets, making them susceptible to generating content that perpetuates social stereotypes and biases, which can… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  30. arXiv:2311.09734  [pdf, other

    cs.CL

    Tracking the Newsworthiness of Public Documents

    Authors: Alexander Spangher, Emilio Ferrara, Ben Welsh, Nanyun Peng, Serdar Tumgoren, Jonathan May

    Abstract: Journalists must find stories in huge amounts of textual data (e.g. leaks, bills, press releases) as part of their jobs: determining when and why text becomes news can help us understand coverage patterns and help us build assistive tools. Yet, this is challenging because very few labelled links exist, language use between corpora is very different, and text may be covered for a variety of reasons… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 9 pages, 7 pages appendix

  31. arXiv:2311.09682  [pdf, other

    cs.CL cs.AI

    MacGyver: Are Large Language Models Creative Problem Solvers?

    Authors: Yufei Tian, Abhilasha Ravichander, Lianhui Qin, Ronan Le Bras, Raja Marjieh, Nanyun Peng, Yejin Choi, Thomas L. Griffiths, Faeze Brahman

    Abstract: We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting. To this end, we create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems deliberately designed to trigger innovative usage of objects and necessitate out-of-the-box thinking. We then present our collection to both LLMs and humans to compare and contrast their… ▽ More

    Submitted 27 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  32. arXiv:2311.09562  [pdf, other

    cs.CL

    TextEE: Benchmark, Reevaluation, Reflections, and Future Challenges in Event Extraction

    Authors: Kuan-Hao Huang, I-Hung Hsu, Tanmay Parekh, Zhiyu Xie, Zixuan Zhang, Premkumar Natarajan, Kai-Wei Chang, Nanyun Peng, Heng Ji

    Abstract: Event extraction has gained considerable interest due to its wide-ranging applications. However, recent studies draw attention to evaluation issues, suggesting that reported scores may not accurately reflect the true performance. In this work, we identify and address evaluation challenges, including inconsistency due to varying data assumptions or preprocessing steps, the insufficiency of current… ▽ More

    Submitted 6 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: Paper accepted by ACL 2024 Findings

  33. arXiv:2311.09521  [pdf, other

    cs.CL

    AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation

    Authors: Haoyi Qiu, Kung-Hsiang Huang, Jingnong Qu, Nanyun Peng

    Abstract: Ensuring factual consistency is crucial for natural language generation tasks, particularly in abstractive summarization, where preserving the integrity of information is paramount. Prior works on evaluating factual consistency of summarization often take the entailment-based approaches that first generate perturbed (factual inconsistent) summaries and then train a classifier on the generated data… ▽ More

    Submitted 2 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  34. arXiv:2311.02544  [pdf, ps, other

    cs.LG cs.AI

    Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees

    Authors: Nianli Peng, Brandon Fain

    Abstract: We describe RA-E3 (Reward-Aware Explicit Explore or Exploit), an algorithm with provable guarantees for solving a single or multi-objective Markov Decision Process (MDP) where we want to maximize the expected value of a nonlinear function over accumulated rewards. This allows us to model fairness-aware welfare optimization for multi-objective reinforcement learning as well as risk-aware reinforcem… ▽ More

    Submitted 14 December, 2023; v1 submitted 4 November, 2023; originally announced November 2023.

  35. arXiv:2311.01620  [pdf, other

    cs.CV cs.CL

    ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life Videos

    Authors: Te-Lin Wu, Zi-Yi Dou, Qingyuan Hu, Yu Hou, Nischal Reddy Chandra, Marjorie Freedman, Ralph M. Weischedel, Nanyun Peng

    Abstract: Multimodal counterfactual reasoning is a vital yet challenging ability for AI systems. It involves predicting the outcomes of hypothetical circumstances based on vision and language inputs, which enables AI models to learn from failures and explore hypothetical scenarios. Despite its importance, there are only a few datasets targeting the counterfactual reasoning abilities of multimodal models. Am… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)

  36. arXiv:2311.00288  [pdf, other

    cs.CL cs.AI

    Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks

    Authors: Po-Nien Kung, Fan Yin, Di Wu, Kai-Wei Chang, Nanyun Peng

    Abstract: Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. However, how to select new tasks to improve the performance and generalizability of IT models remains an open question. Training on all existing tasks is impractical due to prohibiting computation requirements, and randomly se… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023 Main

  37. arXiv:2310.17054  [pdf, other

    cs.CL cs.LG

    BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation

    Authors: Yufei Tian, Felix Zhang, Nanyun Peng

    Abstract: Large language models (LLMs) such as GPT-3 have demonstrated a strong capability to generate coherent and contextually relevant text. However, amidst their successes, a crucial issue persists: their generated outputs still lack commonsense at times. Moreover, fine-tuning the entire LLM towards more commonsensical outputs is computationally expensive if not infeasible. In this paper, we present a c… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  38. arXiv:2310.15066  [pdf, other

    cs.CV cs.CL

    Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge

    Authors: Te-Lin Wu, Yu Zhou, Nanyun Peng

    Abstract: The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. One important step towards this goal is to localize and track key active objects that undergo major state change as a consequence of human actions/interactions to the environment without being told exactly what/where to ground (e.g., localizing and track… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)

  39. arXiv:2310.14542  [pdf, other

    cs.CL

    Evaluating Large Language Models on Controlled Generation Tasks

    Authors: Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma

    Abstract: While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks. We present an extensive analysis of various benchmarks including a sentence planning benchmark with different gr… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  40. arXiv:2310.09219  [pdf, other

    cs.CL cs.AI

    "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters

    Authors: Yixin Wan, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, Nanyun Peng

    Abstract: Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content, including professional documents such as recommendation letters. Though bringing convenience, this application also introduces unprecedented fairness concerns. Model-generated reference letters might be directly used by users in professional scenarios. If underlying bi… ▽ More

    Submitted 1 December, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Findings

  41. arXiv:2310.08795  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Mitigating Bias for Question Answering Models by Tracking Bias Influence

    Authors: Mingyu Derek Ma, Jiun-Yu Kao, Arpit Gupta, Yu-Hsiang Lin, Wenbo Zhao, Tagyoung Chung, Wei Wang, Kai-Wei Chang, Nanyun Peng

    Abstract: Models of various NLP tasks have been shown to exhibit stereotypes, and the bias in the question answering (QA) models is especially harmful as the output answers might be directly consumed by the end users. There have been datasets to evaluate bias in QA models, while bias mitigation technique for the QA models is still under-explored. In this work, we propose BMBI, an approach to mitigate the bi… ▽ More

    Submitted 17 June, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: To appear at NAACL 2024 main conference

  42. arXiv:2310.05280  [pdf, other

    cs.CL cs.AI

    Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems

    Authors: Yixin Wan, Jieyu Zhao, Aman Chadha, Nanyun Peng, Kai-Wei Chang

    Abstract: Recent advancements in Large Language Models empower them to follow freeform instructions, including imitating generic or specific demographic personas in conversations. We define generic personas to represent demographic groups, such as "an Asian person", whereas specific personas may take the form of specific popular Asian names like "Yumi". While the adoption of personas enriches user experienc… ▽ More

    Submitted 2 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  43. arXiv:2310.02529  [pdf, other

    cs.SI cs.AI cs.HC

    MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

    Authors: Mingyu Derek Ma, Alexander K. Taylor, Nuan Wen, Yanchen Liu, Po-Nien Kung, Wenna Qin, Shicheng Wen, Azure Zhou, Diyi Yang, Xuezhe Ma, Nanyun Peng, Wei Wang

    Abstract: We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights, including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users,… ▽ More

    Submitted 20 February, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: To appear at AAAI'24. System demo video and more info: info-pathways.github.io

  44. arXiv:2309.08943  [pdf, other

    cs.CL

    Contextual Label Projection for Cross-Lingual Structured Prediction

    Authors: Tanmay Parekh, I-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, Nanyun Peng

    Abstract: Label projection, which involves obtaining translated labels and texts jointly, is essential for leveraging machine translation to facilitate cross-lingual transfer in structured prediction tasks. Prior research exploring label projection often compromise translation accuracy by favoring simplified label translation or relying solely on word-level alignments. In this paper, we introduce a novel la… ▽ More

    Submitted 14 April, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted at NAACL 2024

  45. arXiv:2307.12950  [pdf, other

    cs.CL cs.AI

    RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment

    Authors: Kevin Yang, Dan Klein, Asli Celikyilmaz, Nanyun Peng, Yuandong Tian

    Abstract: We propose Reinforcement Learning from Contrastive Distillation (RLCD), a method for aligning language models to follow principles expressed in natural language (e.g., to be more harmless) without using human feedback. RLCD creates preference pairs from two contrasting model outputs, one using a positive prompt designed to encourage following the given principles, and one using a negative prompt d… ▽ More

    Submitted 16 March, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: ICLR 2024

  46. arXiv:2306.15774  [pdf

    cs.HC cs.CL cs.CV cs.LG

    Next Steps for Human-Centered Generative AI: A Technical Perspective

    Authors: Xiang 'Anthony' Chen, Jeff Burke, Ruofei Du, Matthew K. Hong, Jennifer Jacobs, Philippe Laban, Dingzeyu Li, Nanyun Peng, Karl D. D. Willis, Chien-Sheng Wu, Bolei Zhou

    Abstract: Through iterative, cross-disciplinary discussions, we define and propose next-steps for Human-centered Generative AI (HGAI). We contribute a comprehensive research agenda that lays out future directions of Generative AI spanning three levels: aligning with human values; assimilating human intents; and augmenting human abilities. By identifying these next-steps, we intend to draw interdisciplinary… ▽ More

    Submitted 22 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

  47. arXiv:2306.14060  [pdf, other

    cs.CV cs.CL cs.LG

    DesCo: Learning Object Recognition with Rich Language Descriptions

    Authors: Liunian Harold Li, Zi-Yi Dou, Nanyun Peng, Kai-Wei Chang

    Abstract: Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and improve the models' adaptability to identify novel objects and domains. Recently, several studies have attempted to query these models with complex language expressions th… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

  48. arXiv:2306.11879  [pdf, other

    cs.CL

    Open-Domain Text Evaluation via Contrastive Distribution Methods

    Authors: Sidi Lu, Hongyi Liu, Asli Celikyilmaz, Tianlu Wang, Nanyun Peng

    Abstract: Recent advancements in open-domain text generation, driven by the power of large pre-trained language models (LLMs), have demonstrated remarkable performance. However, assessing these models' generation quality remains a challenge. In this paper, we introduce a novel method for evaluating open-domain text generation called Contrastive Distribution Methods (CDM). Leveraging the connection between i… ▽ More

    Submitted 9 June, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML 2024

  49. arXiv:2306.11825  [pdf, other

    cs.CL

    DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models

    Authors: Sidi Lu, Wenbo Zhao, Chenyang Tao, Arpit Gupta, Shanchan Wu, Tagyoung Chung, Nanyun Peng

    Abstract: NeurAlly-Decomposed Oracle (NADO) is a powerful approach for controllable generation with large language models. It is designed to avoid catastrophic forgetting while achieving guaranteed convergence to an entropy-maximized closed-form optimal solution with reasonable modeling capacity. Despite the success, several challenges arise when apply NADO to a wide range of scenarios. Vanilla NADO suffers… ▽ More

    Submitted 6 June, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML 2024 (Poster). Work was done during an Amazon Internship Program

  50. arXiv:2305.19228  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Unsupervised Melody-to-Lyric Generation

    Authors: Yufei Tian, Anjali Narayan-Chen, Shereen Oraby, Alessandra Cervone, Gunnar Sigurdsson, Chenyang Tao, Wenbo Zhao, Yiwen Chen, Tagyoung Chung, Jing Huang, Nanyun Peng

    Abstract: Automatic melody-to-lyric generation is a task in which song lyrics are generated to go with a given melody. It is of significant practical interest and more challenging than unconstrained lyric generation as the music imposes additional constraints onto the lyrics. The training data is limited as most songs are copyrighted, resulting in models that underfit the complicated cross-modal relationshi… ▽ More

    Submitted 22 December, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ACL 2023. arXiv admin note: substantial text overlap with arXiv:2305.07760