Skip to main content

Showing 1–50 of 61 results for author: Madotto, A

  1. arXiv:2403.04735  [pdf, other

    cs.CV

    SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM

    Authors: Jielin Qiu, Andrea Madotto, Zhaojiang Lin, Paul A. Crook, Yifan Ethan Xu, Xin Luna Dong, Christos Faloutsos, Lei Li, Babak Damavandi, Seungwhan Moon

    Abstract: Vision-extended LLMs have made significant strides in Visual Question Answering (VQA). Despite these advancements, VLLMs still encounter substantial difficulties in handling queries involving long-tail entities, with a tendency to produce erroneous or hallucinated responses. In this work, we introduce a novel evaluative benchmark named \textbf{SnapNTell}, specifically tailored for entity-centric V… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  2. arXiv:2402.04379  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

    Authors: Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi

    Abstract: We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculatio… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: ICLR 2024. Code available at: https://github.com/facebookresearch/crystal-llm

  3. arXiv:2309.16058  [pdf, other

    cs.LG cs.CL cs.CV

    AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

    Authors: Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Tushar Nagarajan, Matt Smith, Shashank Jain, Chun-Fu Yeh, Prakash Murugesan, Peyman Heidari, Yue Liu, Kavya Srinet, Babak Damavandi, Anuj Kumar

    Abstract: We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  4. arXiv:2307.02768  [pdf, other

    cs.CL

    Training Models to Generate, Recognize, and Reframe Unhelpful Thoughts

    Authors: Mounica Maddela, Megan Ung, Jing Xu, Andrea Madotto, Heather Foran, Y-Lan Boureau

    Abstract: Many cognitive approaches to well-being, such as recognizing and reframing unhelpful thoughts, have received considerable empirical support over the past decades, yet still lack truly widespread adoption in self-help format. A barrier to that adoption is a lack of adequately specific and diverse dedicated practice material. This work examines whether current language models can be leveraged to bot… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

    Comments: ACL 2023

  5. arXiv:2305.13721  [pdf, other

    cs.CL cs.AI

    Continual Dialogue State Tracking via Example-Guided Question Answering

    Authors: Hyundong Cho, Andrea Madotto, Zhaojiang Lin, Khyathi Raghavi Chandu, Satwik Kottur, Jing Xu, Jonathan May, Chinnadhurai Sankar

    Abstract: Dialogue systems are frequently updated to accommodate new services, but naively updating them by continually training with data for new services in diminishing performance on previously learnt services. Motivated by the insight that dialogue state tracking (DST), a crucial component of dialogue systems that estimates the user's goal as a conversation proceeds, is a simple natural language underst… ▽ More

    Submitted 14 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 11 pages, EMNLP 2023

  6. arXiv:2210.14395  [pdf, other

    cs.CV cs.CL cs.LG

    IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

    Authors: Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Alireza Dirafzoon, Aparajita Saraf, Amy Bearman, Babak Damavandi

    Abstract: We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with video and text, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP). The proposed approach allows IMU2CLIP to translate human motions (as measured by IMU sensors) into their corresponding textual descriptions and videos -- wh… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  7. arXiv:2210.07652  [pdf, other

    cs.CL cs.AI

    Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values

    Authors: Yejin Bang, Tiezheng Yu, Andrea Madotto, Zhaojiang Lin, Mona Diab, Pascale Fung

    Abstract: Many NLP classification tasks, such as sexism/racism detection or toxicity detection, are based on human values. Yet, human values can vary under diverse cultural conditions. Therefore, we introduce a framework for value-aligned classification that performs prediction based on explicitly written human values in the command. Along with the task, we propose a practical approach that distills value-a… ▽ More

    Submitted 14 October, 2022; originally announced October 2022.

  8. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  9. arXiv:2205.05989  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Answering Open-ended Ethical Quandary Questions

    Authors: Yejin Bang, Nayeon Lee, Tiezheng Yu, Leila Khalatbari, Yan Xu, Samuel Cahyawijaya, Dan Su, Bryan Wilie, Romain Barraud, Elham J. Barezi, Andrea Madotto, Hayden Kee, Pascale Fung

    Abstract: Considerable advancements have been made in various NLP tasks based on the impressive power of large language models (LLMs) and many NLP applications are deployed in our daily lives. In this work, we challenge the capability of LLMs with the new task of Ethical Quandary Generative Question Answering. Ethical quandary questions are more challenging to address because multiple conflicting answers ma… ▽ More

    Submitted 1 February, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: 16 pages

  10. arXiv:2204.04902  [pdf, other

    cs.CL

    NeuS: Neutral Multi-News Summarization for Mitigating Framing Bias

    Authors: Nayeon Lee, Yejin Bang, Tiezheng Yu, Andrea Madotto, Pascale Fung

    Abstract: Media news framing bias can increase political polarization and undermine civil society. The need for automatic mitigation methods is therefore growing. We propose a new task, a neutral summary generation from multiple news articles of the varying political leanings to facilitate balanced and unbiased news reading. In this paper, we first collect a new dataset, illustrate insights about framing bi… ▽ More

    Submitted 3 May, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: NAACL2022 Long Paper

  11. arXiv:2203.01552  [pdf, other

    cs.CL cs.AI

    Dialogue Summaries as Dialogue States (DS2), Template-Guided Summarization for Few-shot Dialogue State Tracking

    Authors: Jamin Shin, Hangyeol Yu, Hyeongdon Moon, Andrea Madotto, Juneyoung Park

    Abstract: Annotating task-oriented dialogues is notorious for the expensive and difficult data collection process. Few-shot dialogue state tracking (DST) is a realistic solution to this problem. In this paper, we hypothesize that dialogue summaries are essentially unstructured dialogue states; hence, we propose to reformulate dialogue state tracking as a dialogue summarization problem. To elaborate, we trai… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: ACL 2022 (Long, Findings)

  12. arXiv:2203.00314  [pdf, other

    cs.CL

    VScript: Controllable Script Generation with Visual Presentation

    Authors: Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung

    Abstract: In order to offer a customized script tool and inspire professional scriptwriters, we present VScript. It is a controllable pipeline that generates complete scripts, including dialogues and scene descriptions, as well as presents visually using video retrieval. With an interactive interface, our system allows users to select genres and input starting words that control the theme and development of… ▽ More

    Submitted 13 October, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Journal ref: AACL Demo (2022)

  13. Survey of Hallucination in Natural Language Generation

    Authors: Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Delong Chen, Wenliang Dai, Ho Shu Chan, Andrea Madotto, Pascale Fung

    Abstract: Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However,… ▽ More

    Submitted 14 July, 2024; v1 submitted 7 February, 2022; originally announced February 2022.

    ACM Class: A.1

    Journal ref: ACM Computing Surveys (2022)

  14. arXiv:2110.08118  [pdf, other

    cs.CL cs.AI

    Few-Shot Bot: Prompt-Based Learning for Dialogue Systems

    Authors: Andrea Madotto, Zhaojiang Lin, Genta Indra Winata, Pascale Fung

    Abstract: Learning to converse using only a few examples is a great challenge in conversational AI. The current best conversational models, which are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL), are language models (LMs) fine-tuned on large conversational datasets. Training these models is expensive, both in terms of computational resources and time, and it is hard to… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  15. arXiv:2109.07684  [pdf, other

    cs.CL cs.AI

    Language Models are Few-shot Multilingual Learners

    Authors: Genta Indra Winata, Andrea Madotto, Zhaojiang Lin, Rosanne Liu, Jason Yosinski, Pascale Fung

    Abstract: General-purpose language models have demonstrated impressive capabilities, performing on par with state-of-the-art approaches on a range of downstream natural language processing (NLP) tasks and benchmarks when inferring instructions from very few examples. Here, we evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages without a… ▽ More

    Submitted 15 September, 2021; originally announced September 2021.

    Comments: 14 pages

  16. arXiv:2109.04655  [pdf, other

    cs.CL

    Zero-Shot Dialogue State Tracking via Cross-Task Transfer

    Authors: Zhaojiang Lin, Bing Liu, Andrea Madotto, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Eunjoon Cho, Rajen Subba, Pascale Fung

    Abstract: Zero-shot transfer learning for dialogue state tracking (DST) enables us to handle a variety of task-oriented dialogue domains without the expense of collecting in-domain data. In this work, we propose to transfer the \textit{cross-task} knowledge from general question answering (QA) corpora for the zero-shot DST task. Specifically, we propose TransferQA, a transferable generative QA model that se… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  17. arXiv:2108.10561  [pdf, other

    cs.CL cs.AI cs.LG

    Taming the Beast: Learning to Control Neural Conversational Models

    Authors: Andrea Madotto

    Abstract: This thesis investigates the controllability of deep learning-based, end-to-end, generative dialogue systems in both task-oriented and chit-chat scenarios. In particular, we study the different aspects of controlling generative dialogue systems, including controlling styles and topics and continuously adding and combining dialogue skills. In the three decades since the first dialogue system was co… ▽ More

    Submitted 24 August, 2021; originally announced August 2021.

    Comments: PhD thesis

  18. arXiv:2106.06157  [pdf, other

    cs.CL cs.AI

    Assessing Political Prudence of Open-domain Chatbots

    Authors: Yejin Bang, Nayeon Lee, Etsuko Ishii, Andrea Madotto, Pascale Fung

    Abstract: Politically sensitive topics are still a challenge for open-domain chatbots. However, dealing with politically sensitive content in a responsible, non-partisan, and safe behavior way is integral for these chatbots. Currently, the main approach to handling political sensitivity is by simply changing such a topic when it is detected. This is safe but evasive and results in a chatbot that is less eng… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: SIGDIAL 2021 - Safety for E2E Conversational AI (Camera-ready Version)

  19. arXiv:2106.03530  [pdf, ps, other

    cs.CL cs.AI

    CAiRE in DialDoc21: Data Augmentation for Information-Seeking Dialogue System

    Authors: Etsuko Ishii, Yan Xu, Genta Indra Winata, Zhaojiang Lin, Andrea Madotto, Zihan Liu, Peng Xu, Pascale Fung

    Abstract: Information-seeking dialogue systems, including knowledge identification and response generation, aim to respond to users with fluent, coherent, and informative responses based on users' needs, which. To tackle this challenge, we utilize data augmentation methods and several training techniques with the pre-trained language models to learn a general pattern of the task and thus achieve promising p… ▽ More

    Submitted 7 June, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Accepted in DialDoc21 Workshop in ACL 2021. Etsuko Ishii and Yan Xu contributed equally to this work

  20. arXiv:2106.02787  [pdf, other

    cs.CL

    BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

    Authors: Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Peng Xu, Feijun Jiang, Yuxiang Hu, Chen Shi, Pascale Fung

    Abstract: Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD, the first bilingual multi-domain dataset for end-to-end t… ▽ More

    Submitted 4 June, 2021; originally announced June 2021.

    Comments: 22 pages

  21. arXiv:2105.06912  [pdf, other

    cs.CL cs.AI cs.IR

    QAConv: Question Answering on Informative Conversations

    Authors: Chien-Sheng Wu, Andrea Madotto, Wenhao Liu, Pascale Fung, Caiming Xiong

    Abstract: This paper introduces QAConv, a new question answering (QA) dataset that uses conversations as a knowledge source. We focus on informative conversations, including business emails, panel discussions, and work channels. Unlike open-domain and task-oriented dialogues, these conversations are usually long, complex, asynchronous, and involve strong domain knowledge. In total, we collect 34,608 QA pair… ▽ More

    Submitted 14 April, 2022; v1 submitted 14 May, 2021; originally announced May 2021.

    Comments: ACL 2022. Data and code are available at https://github.com/salesforce/QAConv

  22. arXiv:2105.06232  [pdf, other

    cs.CL cs.AI

    Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters

    Authors: Yan Xu, Etsuko Ishii, Samuel Cahyawijaya, Zihan Liu, Genta Indra Winata, Andrea Madotto, Dan Su, Pascale Fung

    Abstract: To diversify and enrich generated dialogue responses, knowledge-grounded dialogue has been investigated in recent years. The existing methods tackle the knowledge grounding challenge by retrieving the relevant sentences over a large corpus and augmenting the dialogues with explicit extra information. Despite their success, however, the existing works have drawbacks in inference efficiency. This pa… ▽ More

    Submitted 25 April, 2022; v1 submitted 13 May, 2021; originally announced May 2021.

    Comments: The first two authors contribute equally; Accepted in ACL 2022 DialDoc Workshop (Best Student Paper Award)

  23. arXiv:2105.04222  [pdf, other

    cs.CL

    Leveraging Slot Descriptions for Zero-Shot Cross-Domain Dialogue State Tracking

    Authors: Zhaojiang Lin, Bing Liu, Seungwhan Moon, Paul Crook, Zhenpeng Zhou, Zhiguang Wang, Zhou Yu, Andrea Madotto, Eunjoon Cho, Rajen Subba

    Abstract: Zero-shot cross-domain dialogue state tracking (DST) enables us to handle task-oriented dialogue in unseen domains without the expense of collecting in-domain data. In this paper, we propose a slot description enhanced generative approach for zero-shot cross-domain DST. Specifically, our model first encodes dialogue context and slots with a pre-trained self-attentive encoder, and generates slot va… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: NAACL 2021

  24. arXiv:2104.08775  [pdf, other

    cs.AI cs.CL cs.SI

    Dynamically Addressing Unseen Rumor via Continual Learning

    Authors: Nayeon Lee, Andrea Madotto, Yejin Bang, Pascale Fung

    Abstract: Rumors are often associated with newly emerging events, thus, an ability to deal with unseen rumors is crucial for a rumor veracity classification model. Previous works address this issue by improving the model's generalizability, with an assumption that the model will stay unchanged even after the new outbreak of an event. In this work, we propose an alternative solution to continuously update th… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  25. arXiv:2104.08455  [pdf, other

    cs.CL

    Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding

    Authors: Nouha Dziri, Andrea Madotto, Osmar Zaiane, Avishek Joey Bose

    Abstract: Dialogue systems powered by large pre-trained language models (LM) exhibit an innate ability to deliver fluent and natural-looking responses. Despite their impressive generation performance, these models can often generate factually incorrect statements impeding their widespread adoption. In this paper, we focus on the task of improving the faithfulness -- and thus reduce hallucination -- of Neura… ▽ More

    Submitted 14 September, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021 18 pages

  26. arXiv:2104.00336  [pdf, other

    cs.CL cs.AI

    Mitigating Media Bias through Neutral Article Generation

    Authors: Nayeon Lee, Yejin Bang, Andrea Madotto, Pascale Fung

    Abstract: Media bias can lead to increased political polarization, and thus, the need for automatic mitigation methods is growing. Existing mitigation work displays articles from multiple news outlets to provide diverse news coverage, but without neutralizing the bias inherent in each of the displayed articles. Therefore, we propose a new task, a single neutralized article generation out of multiple biased… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

  27. arXiv:2103.13309  [pdf, other

    cs.CL cs.LG

    Are Multilingual Models Effective in Code-Switching?

    Authors: Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Pascale Fung

    Abstract: Multilingual language models have shown decent performance in multilingual and cross-lingual natural language understanding tasks. However, the power of these multilingual models in code-switching tasks has not been fully explored. In this paper, we study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting by considering t… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  28. arXiv:2103.09535  [pdf, other

    cs.CL cs.LG

    Towards Few-Shot Fact-Checking via Perplexity

    Authors: Nayeon Lee, Yejin Bang, Andrea Madotto, Madian Khabsa, Pascale Fung

    Abstract: Few-shot learning has drawn researchers' attention to overcome the problem of data scarcity. Recently, large pre-trained language models have shown great performance in few-shot learning for various downstream tasks, such as question answering and machine translation. Nevertheless, little exploration has been made to achieve few-shot learning for the fact-checking task. However, fact-checking is a… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: Accpeted to NAACL'21

  29. arXiv:2012.15504  [pdf, other

    cs.CL cs.AI

    Continual Learning in Task-Oriented Dialogue Systems

    Authors: Andrea Madotto, Zhaojiang Lin, Zhenpeng Zhou, Seungwhan Moon, Paul Crook, Bing Liu, Zhou Yu, Eunjoon Cho, Zhiguang Wang

    Abstract: Continual learning in task-oriented dialogue systems can allow us to add new domains and functionalities through time without incurring the high cost of a whole system retraining. In this paper, we propose a continual learning benchmark for task-oriented dialogue systems with 37 domains to be learned continuously in four settings, such as intent recognition, state tracking, natural language genera… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

    Comments: 9 pages

  30. arXiv:2012.04373  [pdf, other

    cs.CL cs.AI

    CrossNER: Evaluating Cross-Domain Named Entity Recognition

    Authors: Zihan Liu, Yan Xu, Tiezheng Yu, Wenliang Dai, Ziwei Ji, Samuel Cahyawijaya, Andrea Madotto, Pascale Fung

    Abstract: Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collec… ▽ More

    Submitted 13 December, 2020; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Accepted in AAAI-2021

  31. arXiv:2010.04344  [pdf, other

    cs.CL cs.AI

    Plug-and-Play Conversational Models

    Authors: Andrea Madotto, Etsuko Ishii, Zhaojiang Lin, Sumanth Dathathri, Pascale Fung

    Abstract: There has been considerable progress made towards conversational models that generate coherent and fluent responses; however, this often involves training large language models on large dialogue datasets, such as Reddit. These large conversational models provide little control over the generated responses, and this control is further limited in the absence of annotated conversational datasets for… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: Accepted in EMNLP findings, and code available at https://github.com/andreamad8/PPCM

  32. arXiv:2009.13656  [pdf, other

    cs.CL cs.AI

    Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems

    Authors: Andrea Madotto, Samuel Cahyawijaya, Genta Indra Winata, Yan Xu, Zihan Liu, Zhaojiang Lin, Pascale Fung

    Abstract: Task-oriented dialogue systems are either modularized with separate dialogue state tracking (DST) and management steps or end-to-end trainable. In either case, the knowledge base (KB) plays an essential role in fulfilling user requests. Modularized systems rely on DST to interact with the KB, which is expensive in terms of annotation and inference time. End-to-end systems use the KB directly as in… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: Accepted EMNLP findings

  33. arXiv:2009.12005  [pdf, other

    cs.CL cs.AI

    MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems

    Authors: Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Pascale Fung

    Abstract: In this paper, we propose Minimalist Transfer Learning (MinTL) to simplify the system design process of task-oriented dialogue systems and alleviate the over-dependency on annotated data. MinTL is a simple yet effective transfer learning framework, which allows us to plug-and-play pre-trained seq2seq models, and jointly learn dialogue state tracking and dialogue response generation. Unlike previou… ▽ More

    Submitted 28 September, 2020; v1 submitted 24 September, 2020; originally announced September 2020.

    Comments: EMNLP 2020 camera ready

  34. arXiv:2008.12579  [pdf, other

    cs.CL cs.AI

    The Adapter-Bot: All-In-One Controllable Conversational Model

    Authors: Andrea Madotto, Zhaojiang Lin, Yejin Bang, Pascale Fung

    Abstract: Considerable progress has been made towards conversational models that generate coherent and fluent responses by training large language models on large dialogue datasets. These models have little or no control of the generated responses and miss two important features: continuous dialogue skills integration and seamlessly leveraging diverse knowledge sources. In this paper, we propose the Adapter… ▽ More

    Submitted 20 October, 2020; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: Andrea Madotto and Zhaojiang Lin contributed equally to this work. Video demo: https://www.youtube.com/watch?v=Jz8KWE_gKH0&feature=youtu.be

  35. arXiv:2008.06239  [pdf, other

    cs.CL cs.LG

    Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems

    Authors: Andrea Madotto, Zihan Liu, Zhaojiang Lin, Pascale Fung

    Abstract: Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG). A research challenge is to learn each module with the least amount of samples (i.e., few-shots) given the high cost related to the data collection. The most common and effective technique to solve this… ▽ More

    Submitted 20 August, 2020; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: Blog (https://andreamad8.github.io/few-shot-gpt/), Medium (https://medium.com/@madottoandrea/language-model-as-few-shot-learner-for-task-oriented-dialogue-systems-db4765796744) and Code (https://github.com/andreamad8/TASK-ORIENTED-LM-FEWSHOT)

  36. arXiv:2006.04666  [pdf, other

    cs.CL cs.AI cs.LG

    Misinformation Has High Perplexity

    Authors: Nayeon Lee, Yejin Bang, Andrea Madotto, Pascale Fung

    Abstract: Debunking misinformation is an important and time-critical task as there could be adverse consequences when misinformation is not quashed promptly. However, the usual supervised approach to debunking via misinformation classification requires human-annotated data and is not suited to the fast time-frame of newly emerging events such as the COVID-19 outbreak. In this paper, we postulate that misinf… ▽ More

    Submitted 10 June, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

  37. arXiv:2004.14218  [pdf, other

    cs.CL cs.LG

    Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning

    Authors: Zihan Liu, Genta Indra Winata, Andrea Madotto, Pascale Fung

    Abstract: Recently, fine-tuning pre-trained language models (e.g., multilingual BERT) to downstream cross-lingual tasks has shown promising results. However, the fine-tuning process inevitably changes the parameters of the pre-trained model and weakens its cross-lingual ability, which leads to sub-optimal performance. To alleviate this problem, we leverage continual learning to preserve the original cross-l… ▽ More

    Submitted 4 October, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

  38. arXiv:2004.03829  [pdf, other

    cs.CL

    Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning

    Authors: Zhaojiang Lin, Andrea Madotto, Pascale Fung

    Abstract: Fine-tuning pre-trained generative language models to down-stream language generation tasks has shown promising results. However, this comes with the cost of having a single, large model for each task, which is not ideal in low-memory/power scenarios (e.g., mobile). In this paper, we propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pr… ▽ More

    Submitted 21 September, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: Accepted as Findings of EMNLP 2020, Zhaojiang Lin and Andrea Madotto contributed equally to this work

  39. arXiv:2003.07568  [pdf, other

    cs.CL

    XPersona: Evaluating Multilingual Personalized Chatbot

    Authors: Zhaojiang Lin, Zihan Liu, Genta Indra Winata, Samuel Cahyawijaya, Andrea Madotto, Yejin Bang, Etsuko Ishii, Pascale Fung

    Abstract: Personalized dialogue systems are an essential step toward better human-machine interaction. Existing personalized dialogue agents rely on properly designed conversational datasets, which are mostly monolingual (e.g., English), which greatly limits the usage of conversational agents in other languages. In this paper, we propose a multi-lingual extension of Persona-Chat, namely XPersona. Our datase… ▽ More

    Submitted 8 April, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

    Comments: Preprint, 23 pages

  40. arXiv:2003.01901  [pdf, other

    eess.AS cs.SD

    Learning Fast Adaptation on Cross-Accented Speech Recognition

    Authors: Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung

    Abstract: Local dialects influence people to pronounce words of the same language differently from each other. The great variability and complex characteristics of accents creates a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system. In this paper, we introduce a cross-accented English speech recognition task as a benchmark for measuring the ability of the mo… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

    Comments: The first three authors contributed equally to this work

  41. arXiv:2001.11164  [pdf, other

    cs.CL

    On the Importance of Word Order Information in Cross-lingual Sequence Labeling

    Authors: Zihan Liu, Genta Indra Winata, Samuel Cahyawijaya, Andrea Madotto, Zhaojiang Lin, Pascale Fung

    Abstract: Word order variances generally exist in different languages. In this paper, we hypothesize that cross-lingual models that fit into the word order of the source language might fail to handle target languages. To verify this hypothesis, we investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages. To do so, we re… ▽ More

    Submitted 8 December, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

    Comments: Accepted in AAAI-2021

  42. arXiv:2001.08868  [pdf, other

    cs.CL cs.AI

    Exploration Based Language Learning for Text-Based Games

    Authors: Andrea Madotto, Mahdi Namazifar, Joost Huizinga, Piero Molino, Adrien Ecoffet, Huaixiu Zheng, Alexandros Papangelis, Dian Yu, Chandra Khatri, Gokhan Tur

    Abstract: This work presents an exploration and imitation-learning-based agent capable of state-of-the-art performance in playing text-based computer games. Text-based computer games describe their world to the player through natural language and expect the player to interact with the game using text. These games are of interest as they can be seen as a testbed for language understanding, problem-solving, a… ▽ More

    Submitted 7 June, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: Accepted at IJCAI 2020

  43. arXiv:2001.01871  [pdf, other

    cs.CL cs.LG

    Attention over Parameters for Dialogue Systems

    Authors: Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, Jamin Shin, Pascale Fung

    Abstract: Dialogue systems require a great deal of different but complementary expertise to assist, inform, and entertain humans. For example, different domains (e.g., restaurant reservation, train ticket booking) of goal-oriented dialogue systems can be viewed as different skills, and so does ordinary chatting abilities of chit-chat dialogue systems. In this paper, we propose to learn a dialogue system tha… ▽ More

    Submitted 3 March, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

    Comments: NeurIPS Conversational AI Workshops (Best Paper Award)

  44. arXiv:1912.02164  [pdf, other

    cs.CL cs.AI cs.LG

    Plug and Play Language Models: A Simple Approach to Controlled Text Generation

    Authors: Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

    Abstract: Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities. However, controlling attributes of the generated language (e.g. switching topic or sentiment) is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining. We propose a simple alternative: the… ▽ More

    Submitted 3 March, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: ICLR 2020 camera ready

  45. arXiv:1911.04081  [pdf, other

    cs.CL cs.LG

    Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables

    Authors: Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, Pascale Fung

    Abstract: Despite the surging demands for multilingual task-oriented dialog systems (e.g., Alexa, Google Home), there has been less research done in multilingual or cross-lingual scenarios. Hence, we propose a zero-shot adaptation of task-oriented dialogue system to low-resource languages. To tackle this challenge, we first use a set of very few parallel word pairs to refine the aligned cross-lingual word-l… ▽ More

    Submitted 11 November, 2019; originally announced November 2019.

    Comments: Accepted in EMNLP 2019

  46. arXiv:1909.08582  [pdf, other

    cs.CL

    Code-Switched Language Models Using Neural Based Synthetic Data from Parallel Sentences

    Authors: Genta Indra Winata, Andrea Madotto, Chien-Sheng Wu, Pascale Fung

    Abstract: Training code-switched language models is difficult due to lack of data and complexity in the grammatical structure. Linguistic constraint theories have been used for decades to generate artificial code-switching sentences to cope with this issue. However, this require external word alignments or constituency parsers that create erroneous results on distant languages. We propose a sequence-to-sequ… ▽ More

    Submitted 18 September, 2019; originally announced September 2019.

    Comments: Accepted in CoNLL 2019

  47. arXiv:1909.03582  [pdf, other

    cs.CL cs.HC

    Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement Learning

    Authors: Peng Xu, Chien-Sheng Wu, Andrea Madotto, Pascale Fung

    Abstract: Sensational headlines are headlines that capture people's attention and generate reader interest. Conventional abstractive headline generation methods, unlike human writers, do not optimize for maximal reader attention. In this paper, we propose a model that generates sensational headlines without labeled data. We first train a sensationalism scorer by classifying online headlines with many commen… ▽ More

    Submitted 8 September, 2019; originally announced September 2019.

    Comments: Accepted by EMNLP2019

  48. arXiv:1908.09982  [pdf, other

    cs.CL

    On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression

    Authors: Genta Indra Winata, Andrea Madotto, Jamin Shin, Elham J. Barezi, Pascale Fung

    Abstract: Despite their ubiquity in NLP tasks, Long Short-Term Memory (LSTM) networks suffer from computational inefficiencies caused by inherent unparallelizable recurrences, which further aggravates as LSTMs require more parameters for larger memory capacity. In this paper, we propose to apply low-rank matrix factorization (MF) algorithms to different recurrences in LSTMs, and explore the effectiveness on… ▽ More

    Submitted 26 August, 2019; originally announced August 2019.

    Comments: Accepted in PACLIC 2019

  49. arXiv:1908.07687  [pdf, other

    cs.CL

    MoEL: Mixture of Empathetic Listeners

    Authors: Zhaojiang Lin, Andrea Madotto, Jamin Shin, Peng Xu, Pascale Fung

    Abstract: Previous research on empathetic dialogue systems has mostly focused on generating responses given certain emotions. However, being empathetic not only requires the ability of generating emotional responses, but more importantly, requires the understanding of user emotions and replying appropriately. In this paper, we propose a novel end-to-end approach for modeling empathy in dialogue systems: Mix… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: Accepted by EMNLP2019

  50. arXiv:1908.04621  [pdf, other

    cs.CL cs.AI

    Getting To Know You: User Attribute Extraction from Dialogues

    Authors: Chien-Sheng Wu, Andrea Madotto, Zhaojiang Lin, Peng Xu, Pascale Fung

    Abstract: User attributes provide rich and useful information for user understanding, yet structured and easy-to-use attributes are often sparsely populated. In this paper, we leverage dialogues with conversational agents, which contain strong suggestions of user information, to automatically extract user attributes. Since no existing dataset is available for this purpose, we apply distant supervision to tr… ▽ More

    Submitted 13 August, 2019; originally announced August 2019.

    Comments: 1st Workshop on NLP for Conversational AI @ ACL 2019