Skip to main content

Showing 1–50 of 114 results for author: Morency, L

  1. arXiv:2407.09801  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    IoT-LM: Large Multisensory Language Models for the Internet of Things

    Authors: Shentong Mo, Russ Salakhutdinov, Louis-Philippe Morency, Paul Pu Liang

    Abstract: The Internet of Things (IoT) network integrating billions of smart physical devices embedded with sensors, software, and communication technologies is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, and audio to recognize the states of humans and physical… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.06217

  2. arXiv:2407.03418  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    HEMM: Holistic Evaluation of Multimodal Foundation Models

    Authors: Paul Pu Liang, Akshay Goindani, Talha Chafekar, Leena Mathur, Haofei Yu, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Code available at https://github.com/pliang279/HEMM

  3. arXiv:2404.11023  [pdf, other

    cs.HC cs.CL cs.LG

    Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions

    Authors: Leena Mathur, Paul Pu Liang, Louis-Philippe Morency

    Abstract: Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal that involves creating agents that can sense, perceive, reason about, learn from, and respond to affect, behavior, and cognition of other agents (human or artificial). Progress towards Social-AI has accelerated in the past decade across several computing communities, including natural language proc… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Position Paper, Under Review, 19 pages, 2 figures

  4. arXiv:2403.11330  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback

    Authors: Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency

    Abstract: We describe an approach for aligning an LLM-based dialogue agent based on global (i.e., dialogue-level) rewards, while also taking into account naturally-occurring multimodal signals. At a high level, our approach (dubbed GELI) learns a local, turn-level reward model by decomposing the human-provided Global Explicit (GE) session-level reward, using Local Implicit (LI) multimodal reward signals to… ▽ More

    Submitted 22 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: 10 pages, 3 figures, 2 tables

  5. arXiv:2402.14979  [pdf, other

    cs.LG cs.CL stat.ME

    Optimizing Language Models for Human Preferences is a Causal Inference Problem

    Authors: Victoria Lin, Eli Ben-Michael, Louis-Philippe Morency

    Abstract: As large language models (LLMs) see greater use in academic and commercial settings, there is increasing interest in methods that allow language models to generate texts aligned with human preferences. In this paper, we present an initial exploration of language model optimization for human preferences from direct outcome datasets, where each sample consists of a text and an associated numerical o… ▽ More

    Submitted 5 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: UAI 2024

  6. arXiv:2311.10227  [pdf, other

    cs.AI cs.CL

    Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

    Authors: Alex Wilf, Sihyun Shawn Lee, Paul Pu Liang, Louis-Philippe Morency

    Abstract: Human interactions are deeply rooted in the interplay of thoughts, beliefs, and desires made possible by Theory of Mind (ToM): our cognitive ability to understand the mental states of ourselves and others. Although ToM may come naturally to us, emulating it presents a challenge to even the most advanced Large Language Models (LLMs). Recent improvements to LLMs' reasoning capabilities from simple y… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  7. arXiv:2311.09580  [pdf, other

    cs.CL

    MMOE: Mixture of Multimodal Interaction Experts

    Authors: Haofei Yu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Multimodal machine learning, which studies the information and interactions across various input modalities, has made significant advancements in understanding the relationship between images and descriptive text. However, this is just a portion of the potential multimodal interactions seen in the real world and does not include new interactions between conflicting utterances and gestures in predi… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  8. arXiv:2311.06217  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiIoT: Benchmarking Machine Learning for the Internet of Things

    Authors: Shentong Mo, Louis-Philippe Morency, Russ Salakhutdinov, Paul Pu Liang

    Abstract: The next generation of machine learning systems must be adept at perceiving and interacting with the physical world through a diverse array of sensory channels. Commonly referred to as the `Internet of Things (IoT)' ecosystem, sensory data from motion, thermal, geolocation, depth, wireless signals, video, and audio are increasingly used to model the states of physical environments and the humans i… ▽ More

    Submitted 4 July, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

  9. arXiv:2311.02253  [pdf, other

    cs.LG cs.AI

    Comparative Knowledge Distillation

    Authors: Alex Wilf, Alex Tianyi Xu, Paul Pu Liang, Alexander Obolenskiy, Daniel Fried, Louis-Philippe Morency

    Abstract: In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance. Traditional KD paradigms, however, assume readily available access to teacher models for frequent inference -- a notion increasingly at odds with the realities of c… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.13011

  10. arXiv:2310.20697  [pdf, other

    cs.CL stat.ME

    Text-Transport: Toward Learning Causal Effects of Natural Language

    Authors: Victoria Lin, Louis-Philippe Morency, Eli Ben-Michael

    Abstract: As language technologies gain prominence in real-world settings, it is important to understand how changes to language affect reader perceptions. This can be formalized as the causal effect of varying a linguistic attribute (e.g., sentiment) on a reader's response to the text. In this paper, we introduce Text-Transport, a method for estimation of causal effects from natural language under any text… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  11. arXiv:2310.11667  [pdf, other

    cs.AI cs.CL cs.LG

    SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

    Authors: Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap

    Abstract: Humans are social beings; we pursue social goals in our daily interactions, which is a crucial aspect of social intelligence. Yet, AI systems' abilities in this realm remain elusive. We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence. In our environment, agents role-play and interact under a wide va… ▽ More

    Submitted 22 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Preprint, 43 pages. The first two authors contribute equally

  12. arXiv:2306.16413  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning

    Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiZoo, a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms and MultiBench, a large-scale benchmark spanning 15 datase… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: JMLR Open Source Software 2023, Code available at https://github.com/pliang279/MultiBench

  13. Neural Mixed Effects for Nonlinear Personalized Predictions

    Authors: Torsten Wörtwein, Nicholas Allen, Lisa B. Sheeber, Randy P. Auerbach, Jeffrey F. Cohn, Louis-Philippe Morency

    Abstract: Personalized prediction is a machine learning approach that predicts a person's future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, an… ▽ More

    Submitted 31 August, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: camera-ready version

  14. arXiv:2306.05268  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    Factorized Contrastive Learning: Going Beyond Multi-view Redundancy

    Authors: Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: In a wide range of multimodal tasks, contrastive learning has become a particularly appealing approach since it can successfully learn representations from abundant unlabeled data with only pairing information (e.g., image-caption or video-audio pairs). Underpinning these approaches is the assumption of multi-view redundancy - that shared information between modalities is necessary and sufficient… ▽ More

    Submitted 30 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023. Code available at: https://github.com/pliang279/FactorCL

  15. arXiv:2306.04898  [pdf, other

    cs.LG cs.CV

    Understanding Masked Autoencoders via Hierarchical Latent Variable Models

    Authors: Lingjing Kong, Martin Q. Ma, Guangyi Chen, Eric P. Xing, Yuejie Chi, Louis-Philippe Morency, Kun Zhang

    Abstract: Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empiric… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: CVPR 2023 Highlight

  16. arXiv:2306.04597  [pdf, other

    cs.CL cs.LG

    Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

    Authors: Himanshu Thakur, Atishay Jain, Praneetha Vaddamanu, Paul Pu Liang, Louis-Philippe Morency

    Abstract: Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since large-scale retraining of these models from scratch is both time and compute-expensive, a variety of approaches have been previously proposed that de-bias a pre-trai… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 Main Conference

  17. arXiv:2306.04539  [pdf, other

    cs.LG cs.CL cs.CV cs.IT stat.ML

    Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

    Authors: Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alex Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not present in either alone. We study this challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data and naturally co-occurri… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: ICLR 2024, Code available at: https://github.com/pliang279/PID

  18. arXiv:2306.04125  [pdf, other

    cs.LG cs.CL cs.HC

    Multimodal Fusion Interactions: A Study of Human and Automatic Quantification

    Authors: Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different a… ▽ More

    Submitted 30 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: International Conference on Multimodal Interaction (ICMI '23), Code available at: https://github.com/pliang279/PID. arXiv admin note: text overlap with arXiv:2302.12247

  19. arXiv:2305.14728  [pdf, other

    cs.CL

    SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language Representations

    Authors: Victoria Lin, Louis-Philippe Morency

    Abstract: Although deep language representations have become the dominant form of language featurization in recent years, in many settings it is important to understand a model's decision-making process. This necessitates not only an interpretable model but also interpretable features. In particular, language must be featurized in a way that is interpretable while still characterizing the original text well… ▽ More

    Submitted 1 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  20. arXiv:2305.14577  [pdf, other

    cs.LG cs.CL

    Difference-Masking: Choosing What to Mask in Continued Pretraining

    Authors: Alex Wilf, Syeda Nahida Akter, Leena Mathur, Paul Pu Liang, Sheryl Mathew, Mengrou Shou, Eric Nyberg, Louis-Philippe Morency

    Abstract: The self-supervised objective of masking-and-predicting has led to promising performance gains on a variety of downstream tasks. However, while most approaches randomly mask tokens, there is strong intuition that deciding what to mask can substantially improve learning outcomes. We investigate this in continued pretraining setting in which pretrained models continue to pretrain on domain-specific… ▽ More

    Submitted 17 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  21. arXiv:2305.14083  [pdf, other

    cs.LG cs.CL

    Counterfactual Augmentation for Multimodal Learning Under Presentation Bias

    Authors: Victoria Lin, Louis-Philippe Morency, Dimitrios Dimitriadis, Srinagesh Sharma

    Abstract: In real-world machine learning systems, labels are often derived from user behaviors that the system wishes to encourage. Over time, new models must be trained as new training examples and features become available. However, feedback loops between users and models can bias future user behavior, inducing a presentation bias in the labels that compromises the ability to train new models. In this pap… ▽ More

    Submitted 30 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of EMNLP 2023

  22. arXiv:2305.13583  [pdf, other

    cs.CL cs.MM eess.AS eess.IV

    Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical Fusion for Multimodal Affect Recognition

    Authors: Yaoting Wang, Yuanchao Li, Paul Pu Liang, Louis-Philippe Morency, Peter Bell, Catherine Lai

    Abstract: Fusing multiple modalities has proven effective for multimodal information processing. However, the incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition. In this study, we first analyze how the salient affective information in one modality can be affected by the other, and demonstrate that inter-modal incongruity exists latently in crossmodal att… ▽ More

    Submitted 12 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: *First two authors contributed equally

  23. arXiv:2305.10827  [pdf, other

    cs.HC cs.AI

    Expanding the Role of Affective Phenomena in Multimodal Interaction Research

    Authors: Leena Mathur, Maja J Matarić, Louis-Philippe Morency

    Abstract: In recent decades, the field of affective computing has made substantial progress in advancing the ability of AI systems to recognize and express affective phenomena, such as affect and emotions, during human-human and human-machine interactions. This paper describes our examination of research at the intersection of multimodal interaction and affective computing, with the objective of observing t… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 4 pages, 4 figures

  24. arXiv:2302.12247  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.IT

    Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework

    Authors: Paul Pu Liang, Yun Cheng, Xiang Fan, Chun Kai Ling, Suzanne Nie, Richard Chen, Zihao Deng, Nicholas Allen, Randy Auerbach, Faisal Mahmood, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimo… ▽ More

    Submitted 10 December, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023. Code available at: https://github.com/pliang279/PID

  25. arXiv:2212.10549  [pdf, other

    cs.CL cs.CV cs.LG

    Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment

    Authors: Rohan Pandey, Rulin Shao, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Despite recent progress towards scaling up multimodal vision-language models, these models are still known to struggle on compositional generalization benchmarks such as Winoground. We find that a critical component lacking from current vision-language models is relation-level alignment: the ability to match directional semantic relations in text (e.g., "mug in grass") with spatial relationships i… ▽ More

    Submitted 4 July, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  26. arXiv:2211.13196  [pdf, other

    cs.LG cs.CL

    SeedBERT: Recovering Annotator Rating Distributions from an Aggregated Label

    Authors: Aneesha Sampath, Victoria Lin, Louis-Philippe Morency

    Abstract: Many machine learning tasks -- particularly those in affective computing -- are inherently subjective. When asked to classify facial expressions or to rate an individual's attractiveness, humans may disagree with one another, and no single answer may be objectively correct. However, machine learning datasets commonly have just one "ground truth" label for each sample, so models trained on these la… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: To be published in AAAI-23 Workshop on Uncertainty Reasoning and Quantification in Decision Making

  27. Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

    Authors: Xiang Fan, Yiwei Lyu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categ… ▽ More

    Submitted 22 September, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: Accepted to ACL Findings 2023

  28. arXiv:2210.04714  [pdf, other

    cs.CL cs.LG stat.ML

    Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis

    Authors: Yuxin Xiao, Paul Pu Liang, Umang Bhatt, Willie Neiswanger, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when w… ▽ More

    Submitted 14 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022 (Findings)

  29. arXiv:2209.12343  [pdf, other

    cs.CV cs.LG

    Paraphrasing Is All You Need for Novel Object Captioning

    Authors: Cheng-Fu Yang, Yao-Hung Hubert Tsai, Wan-Cyuan Fan, Ruslan Salakhutdinov, Louis-Philippe Morency, Yu-Chiang Frank Wang

    Abstract: Novel object captioning (NOC) aims to describe images containing objects without observing their ground truth captions during training. Due to the absence of caption annotation, captioning models cannot be directly optimized via sequence-to-sequence training or CIDEr optimization. As a result, we present Paraphrasing-to-Captioning (P2C), a two-stage learning framework for NOC, which would heuristi… ▽ More

    Submitted 25 September, 2022; originally announced September 2022.

    Comments: Accepted at NeurIPS 2022

  30. arXiv:2209.03430  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Authors: Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency

    Abstract: Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, tex… ▽ More

    Submitted 20 February, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

  31. arXiv:2208.08080  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.MM

    Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides

    Authors: Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency

    Abstract: Lecture slide presentations, a sequence of pages that contain text and figures accompanied by speech, are constructed and presented carefully in order to optimally transfer knowledge to students. Previous studies in multimedia and psychology attribute the effectiveness of lecture presentations to their multimodal nature. As a step toward developing AI to aid in student learning as intelligent teac… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 9 pages, 5 figures

  32. arXiv:2208.01036  [pdf, other

    cs.LG cs.AI cs.CV

    Face-to-Face Contrastive Learning for Social Intelligence Question-Answering

    Authors: Alex Wilf, Martin Q. Ma, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency

    Abstract: Creating artificial social intelligence - algorithms that can understand the nuances of multi-person interactions - is an exciting and emerging challenge in processing facial expressions and gestures from multimodal videos. Recent multimodal methods have set the state of the art on many tasks, but have difficulty modeling the complex face-to-face conversational dynamics across speaking turns in so… ▽ More

    Submitted 27 October, 2022; v1 submitted 29 July, 2022; originally announced August 2022.

  33. arXiv:2207.00056  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiViz: Towards Visualizing and Understanding Multimodal Models

    Authors: Paul Pu Liang, Yiwei Lyu, Gunjan Chhablani, Nihal Jain, Zihao Deng, Xingbo Wang, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: The promise of multimodal models for real-world applications has inspired research in visualizing and understanding their internal mechanics with the end goal of empowering stakeholders to visualize model behavior, perform model debugging, and promote trust in machine learning models. However, modern multimodal models are typically black-box neural networks, which makes it challenging to understan… ▽ More

    Submitted 6 March, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: ICLR 2023. Code available at: https://github.com/pliang279/MultiViz

  34. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  35. arXiv:2203.11130  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

    Authors: Samuel Yu, Peter Wu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: In order for AI to be safely deployed in real-world scenarios such as hospitals, schools, and the workplace, it must be able to robustly reason about the physical world. Fundamental to this reasoning is physical common sense: understanding the physical properties and affordances of available objects, how they can be manipulated, and how they interact with other objects. Physical commonsense reason… ▽ More

    Submitted 1 August, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: ECCV 2022, 51 pages, 23 figures, 4 tables

  36. arXiv:2203.02013  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations

    Authors: Yiwei Lyu, Paul Pu Liang, Zihao Deng, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: The ability for a human to understand an Artificial Intelligence (AI) model's decision-making process is critical in enabling stakeholders to visualize model behavior, perform model debugging, promote trust in AI models, and assist in collaborative human-AI decision-making. As a result, the research fields of interpretable and explainable AI have gained traction within AI communities as well as in… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: Code available at https://github.com/lvyiwei1/DIME

  37. arXiv:2203.01311  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning

    Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots. While there has been an explosion of interest in multimodal learning, these methods are focused on a small set of modalities primarily in language, vision, and audio. In order to accelerate generalization towards… ▽ More

    Submitted 28 June, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: TMLR 2023, Code available at https://github.com/pliang279/HighMMT

  38. arXiv:2202.06670  [pdf, other

    cs.LG cs.AI

    Learning Weakly-Supervised Contrastive Representations

    Authors: Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: We argue that a form of the valuable information provided by the auxiliary information is its implied data clustering information. For instance, considering hashtags as auxiliary information, we can hypothesize that an Instagram image will be semantically more similar with the same hashtags. With this intuition, we present a two-stage weakly-supervised contrastive learning approach. The first stag… ▽ More

    Submitted 18 February, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: Published as ICLR 2022. arXiv admin note: substantial text overlap with arXiv:2106.02869

  39. arXiv:2202.05458  [pdf, other

    cs.LG

    Conditional Contrastive Learning with Kernel

    Authors: Yao-Hung Hubert Tsai, Tianqin Li, Martin Q. Ma, Han Zhao, Kun Zhang, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Conditional contrastive learning frameworks consider the conditional sampling procedure that constructs positive or negative data pairs conditioned on specific variables. Fair contrastive learning constructs negative pairs, for example, from the same gender (conditioning on sensitive information), which in turn reduces undesirable information from the learned representations; weakly supervised con… ▽ More

    Submitted 15 March, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

  40. arXiv:2110.13422  [pdf, other

    cs.LG cs.AI stat.ML

    Relay Variational Inference: A Method for Accelerated Encoderless VI

    Authors: Amir Zadeh, Santiago Benoit, Louis-Philippe Morency

    Abstract: Variational Inference (VI) offers a method for approximating intractable likelihoods. In neural VI, inference of approximate posteriors is commonly done using an encoder. Alternatively, encoderless VI offers a framework for learning generative models from data without encountering suboptimalities caused by amortization via an encoder (e.g. in presence of missing or uncertain data). However, in abs… ▽ More

    Submitted 13 January, 2023; v1 submitted 26 October, 2021; originally announced October 2021.

  41. arXiv:2108.01260  [pdf, other

    cs.CL

    M2H2: A Multimodal Multiparty Hindi Dataset For Humor Recognition in Conversations

    Authors: Dushyant Singh Chauhan, Gopendra Vikram Singh, Navonil Majumder, Amir Zadeh, Asif Ekbal, Pushpak Bhattacharyya, Louis-philippe Morency, Soujanya Poria

    Abstract: Humor recognition in conversations is a challenging task that has recently gained popularity due to its importance in dialogue understanding, including in multimodal settings (i.e., text, acoustics, and visual). The few existing datasets for humor are mostly in English. However, due to the tremendous growth in multilingual content, there is a great demand to build models and systems that support m… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: ICMI 2021

  42. arXiv:2107.13669  [pdf, other

    cs.AI

    Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

    Authors: Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-philippe Morency, Soujanya Poria

    Abstract: Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data. This research area's major concern lies in developing an extraordinary fusion scheme that can extract and integrate key information from various modalities. However, one issue that may restrict previous work to ach… ▽ More

    Submitted 28 August, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: Accepted at ICMI 2021

  43. arXiv:2107.07502  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

    Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and mod… ▽ More

    Submitted 10 November, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021 Datasets and Benchmarks Track. Code: https://github.com/pliang279/MultiBench and Website: https://cmu-multicomp-lab.github.io/multibench/

  44. arXiv:2106.13219  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Towards Understanding and Mitigating Social Biases in Language Models

    Authors: Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: As machine learning methods are deployed in real-world settings such as healthcare, legal systems, and social science, it is crucial to recognize how they shape social biases and stereotypes in these sensitive decision-making processes. Among such real-world deployments are large-scale pretrained language models (LMs) that can be potentially dangerous in manifesting undesirable representational bi… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: ICML 2021, code available at https://github.com/pliang279/LM_bias

  45. arXiv:2106.13213  [pdf, other

    cs.LG cs.AI cs.CL cs.HC

    Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data

    Authors: Paul Pu Liang, Terrance Liu, Anna Cai, Michal Muszynski, Ryo Ishii, Nicholas Allen, Randy Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications for the early detection, intervention, and treatment of mental health disorders. One promising data source to help monitor human behavior is daily smartphone usage. Howeve… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: ACL 2021. arXiv admin note: substantial text overlap with arXiv:2012.02359

  46. arXiv:2106.02869  [pdf, other

    cs.LG

    Integrating Auxiliary Information in Self-supervised Learning

    Authors: Yao-Hung Hubert Tsai, Tianqin Li, Weixin Liu, Peiyuan Liao, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: This paper presents to integrate the auxiliary information (e.g., additional attributes for data such as the hashtags for Instagram images) in the self-supervised learning process. We first observe that the auxiliary information may bring us useful information about data structures: for instance, the Instagram images with the same hashtags can be semantically similar. Hence, to leverage the struct… ▽ More

    Submitted 5 June, 2021; originally announced June 2021.

  47. arXiv:2106.02866  [pdf, other

    cs.LG

    Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning

    Authors: Martin Q. Ma, Yao-Hung Hubert Tsai, Paul Pu Liang, Han Zhao, Kun Zhang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Contrastive self-supervised learning (SSL) learns an embedding space that maps similar data pairs closer and dissimilar data pairs farther apart. Despite its success, one issue has been overlooked: the fairness aspect of representations learned using contrastive SSL. Without mitigation, contrastive SSL techniques can incorporate sensitive information such as gender or race and cause potentially un… ▽ More

    Submitted 27 June, 2022; v1 submitted 5 June, 2021; originally announced June 2021.

  48. arXiv:2104.13712  [pdf, other

    cs.LG

    A Note on Connecting Barlow Twins with Negative-Sample-Free Contrastive Learning

    Authors: Yao-Hung Hubert Tsai, Shaojie Bai, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: In this report, we relate the algorithmic design of Barlow Twins' method to the Hilbert-Schmidt Independence Criterion (HSIC), thus establishing it as a contrastive learning approach that is free of negative samples. Through this perspective, we argue that Barlow Twins (and thus the class of negative-sample-free contrastive learning methods) suggests a possibility to bridge the two major families… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

  49. arXiv:2104.05196  [pdf, other

    cs.CL cs.AI cs.LG

    StylePTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer

    Authors: Yiwei Lyu, Paul Pu Liang, Hai Pham, Eduard Hovy, Barnabás Póczos, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Text style transfer aims to controllably generate text with targeted stylistic changes while maintaining core meaning from the source sentence constant. Many of the existing style transfer benchmarks primarily focus on individual high-level semantic changes (e.g. positive to negative), which enable controllability at a high level but do not offer fine-grained control involving sentence structure,… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: NAACL 2021, code available at https://github.com/lvyiwei1/StylePTB/

  50. arXiv:2103.11275  [pdf, other

    cs.LG cs.IT

    Self-supervised Representation Learning with Relative Predictive Coding

    Authors: Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Han Zhao, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance. The key to the success of RPC is two-fold. First, RPC introduces the relative parameters to regularize the objective for boundedness and low variance. Second, RPC contains no… ▽ More

    Submitted 12 April, 2021; v1 submitted 20 March, 2021; originally announced March 2021.