Skip to main content

Showing 1–50 of 68 results for author: Liang, P P

  1. arXiv:2407.09801  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    IoT-LM: Large Multisensory Language Models for the Internet of Things

    Authors: Shentong Mo, Russ Salakhutdinov, Louis-Philippe Morency, Paul Pu Liang

    Abstract: The Internet of Things (IoT) network integrating billions of smart physical devices embedded with sensors, software, and communication technologies is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, and audio to recognize the states of humans and physical… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.06217

  2. arXiv:2407.03418  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    HEMM: Holistic Evaluation of Multimodal Foundation Models

    Authors: Paul Pu Liang, Akshay Goindani, Talha Chafekar, Leena Mathur, Haofei Yu, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Code available at https://github.com/pliang279/HEMM

  3. arXiv:2404.18976  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    Foundations of Multisensory Artificial Intelligence

    Authors: Paul Pu Liang

    Abstract: Building multisensory AI systems that learn from multiple sensory inputs such as text, speech, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. By synthesizing a range of… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: CMU Machine Learning Department PhD Thesis

  4. arXiv:2404.13362  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    Semantically Corrected Amharic Automatic Speech Recognition

    Authors: Samuael Adnew, Paul Pu Liang

    Abstract: Automatic Speech Recognition (ASR) can play a crucial role in enhancing the accessibility of spoken languages worldwide. In this paper, we build a set of ASR tools for Amharic, a language spoken by more than 50 million people primarily in eastern Africa. Amharic is written in the Ge'ez script, a sequence of graphemes with spacings denoting word boundaries. This makes computational processing of Am… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  5. arXiv:2404.11023  [pdf, other

    cs.HC cs.CL cs.LG

    Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions

    Authors: Leena Mathur, Paul Pu Liang, Louis-Philippe Morency

    Abstract: Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal that involves creating agents that can sense, perceive, reason about, learn from, and respond to affect, behavior, and cognition of other agents (human or artificial). Progress towards Social-AI has accelerated in the past decade across several computing communities, including natural language proc… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Position Paper, Under Review, 19 pages, 2 figures

  6. arXiv:2312.04837  [pdf, other

    cs.AI cs.CL cs.CV

    Localized Symbolic Knowledge Distillation for Visual Commonsense Models

    Authors: Jae Sung Park, Jack Hessel, Khyathi Raghavi Chandu, Paul Pu Liang, Ximing Lu, Peter West, Youngjae Yu, Qiuyuan Huang, Jianfeng Gao, Ali Farhadi, Yejin Choi

    Abstract: Instruction following vision-language (VL) models offer a flexible interface that supports a broad range of multimodal tasks in a zero-shot fashion. However, interfaces that operate on full images do not directly enable the user to "point to" and access specific regions within images. This capability is important not only to support reference-grounded VL benchmarks, but also, for practical applica… ▽ More

    Submitted 12 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Neurips 2023

  7. arXiv:2311.10227  [pdf, other

    cs.AI cs.CL

    Think Twice: Perspective-Taking Improves Large Language Models' Theory-of-Mind Capabilities

    Authors: Alex Wilf, Sihyun Shawn Lee, Paul Pu Liang, Louis-Philippe Morency

    Abstract: Human interactions are deeply rooted in the interplay of thoughts, beliefs, and desires made possible by Theory of Mind (ToM): our cognitive ability to understand the mental states of ourselves and others. Although ToM may come naturally to us, emulating it presents a challenge to even the most advanced Large Language Models (LLMs). Recent improvements to LLMs' reasoning capabilities from simple y… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  8. arXiv:2311.09580  [pdf, other

    cs.CL

    MMOE: Mixture of Multimodal Interaction Experts

    Authors: Haofei Yu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Multimodal machine learning, which studies the information and interactions across various input modalities, has made significant advancements in understanding the relationship between images and descriptive text. However, this is just a portion of the potential multimodal interactions seen in the real world and does not include new interactions between conflicting utterances and gestures in predi… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  9. arXiv:2311.06217  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiIoT: Benchmarking Machine Learning for the Internet of Things

    Authors: Shentong Mo, Louis-Philippe Morency, Russ Salakhutdinov, Paul Pu Liang

    Abstract: The next generation of machine learning systems must be adept at perceiving and interacting with the physical world through a diverse array of sensory channels. Commonly referred to as the `Internet of Things (IoT)' ecosystem, sensory data from motion, thermal, geolocation, depth, wireless signals, video, and audio are increasingly used to model the states of physical environments and the humans i… ▽ More

    Submitted 4 July, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

  10. arXiv:2311.02253  [pdf, other

    cs.LG cs.AI

    Comparative Knowledge Distillation

    Authors: Alex Wilf, Alex Tianyi Xu, Paul Pu Liang, Alexander Obolenskiy, Daniel Fried, Louis-Philippe Morency

    Abstract: In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance. Traditional KD paradigms, however, assume readily available access to teacher models for frequent inference -- a notion increasingly at odds with the realities of c… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

    Comments: arXiv admin note: text overlap with arXiv:2310.13011

  11. arXiv:2308.14179  [pdf, other

    cs.CL cs.AI cs.CV

    Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP

    Authors: Vedant Palit, Rohan Pandey, Aryaman Arora, Paul Pu Liang

    Abstract: Mechanistic interpretability seeks to understand the neural mechanisms that enable specific behaviors in Large Language Models (LLMs) by leveraging causality-based methods. While these approaches have identified neural circuits that copy spans of text, capture factual knowledge, and more, they remain unusable for multimodal models since adapting these tools to the vision-language domain requires c… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

    Comments: Final version for 5th Workshop on Closing the Loop Between Vision and Language (CLVL) @ ICCV 2023. 4 pages, 5 figures

  12. arXiv:2306.16413  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning

    Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiZoo, a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms and MultiBench, a large-scale benchmark spanning 15 datase… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: JMLR Open Source Software 2023, Code available at https://github.com/pliang279/MultiBench

  13. arXiv:2306.05268  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    Factorized Contrastive Learning: Going Beyond Multi-view Redundancy

    Authors: Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: In a wide range of multimodal tasks, contrastive learning has become a particularly appealing approach since it can successfully learn representations from abundant unlabeled data with only pairing information (e.g., image-caption or video-audio pairs). Underpinning these approaches is the assumption of multi-view redundancy - that shared information between modalities is necessary and sufficient… ▽ More

    Submitted 30 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023. Code available at: https://github.com/pliang279/FactorCL

  14. arXiv:2306.04597  [pdf, other

    cs.CL cs.LG

    Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

    Authors: Himanshu Thakur, Atishay Jain, Praneetha Vaddamanu, Paul Pu Liang, Louis-Philippe Morency

    Abstract: Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since large-scale retraining of these models from scratch is both time and compute-expensive, a variety of approaches have been previously proposed that de-bias a pre-trai… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 Main Conference

  15. arXiv:2306.04539  [pdf, other

    cs.LG cs.CL cs.CV cs.IT stat.ML

    Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications

    Authors: Paul Pu Liang, Chun Kai Ling, Yun Cheng, Alex Obolenskiy, Yudong Liu, Rohan Pandey, Alex Wilf, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: In many machine learning systems that jointly learn from multiple modalities, a core research question is to understand the nature of multimodal interactions: how modalities combine to provide new task-relevant information that was not present in either alone. We study this challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data and naturally co-occurri… ▽ More

    Submitted 13 June, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: ICLR 2024, Code available at: https://github.com/pliang279/PID

  16. arXiv:2306.04125  [pdf, other

    cs.LG cs.CL cs.HC

    Multimodal Fusion Interactions: A Study of Human and Automatic Quantification

    Authors: Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different a… ▽ More

    Submitted 30 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: International Conference on Multimodal Interaction (ICMI '23), Code available at: https://github.com/pliang279/PID. arXiv admin note: text overlap with arXiv:2302.12247

  17. arXiv:2305.14577  [pdf, other

    cs.LG cs.CL

    Difference-Masking: Choosing What to Mask in Continued Pretraining

    Authors: Alex Wilf, Syeda Nahida Akter, Leena Mathur, Paul Pu Liang, Sheryl Mathew, Mengrou Shou, Eric Nyberg, Louis-Philippe Morency

    Abstract: The self-supervised objective of masking-and-predicting has led to promising performance gains on a variety of downstream tasks. However, while most approaches randomly mask tokens, there is strong intuition that deciding what to mask can substantially improve learning outcomes. We investigate this in continued pretraining setting in which pretrained models continue to pretrain on domain-specific… ▽ More

    Submitted 17 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  18. arXiv:2305.13583  [pdf, other

    cs.CL cs.MM eess.AS eess.IV

    Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical Fusion for Multimodal Affect Recognition

    Authors: Yaoting Wang, Yuanchao Li, Paul Pu Liang, Louis-Philippe Morency, Peter Bell, Catherine Lai

    Abstract: Fusing multiple modalities has proven effective for multimodal information processing. However, the incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition. In this study, we first analyze how the salient affective information in one modality can be affected by the other, and demonstrate that inter-modal incongruity exists latently in crossmodal att… ▽ More

    Submitted 12 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: *First two authors contributed equally

  19. arXiv:2305.12369  [pdf, other

    cs.CV cs.AI cs.LG

    HIINT: Historical, Intra- and Inter- personal Dynamics Modeling with Cross-person Memory Transformer

    Authors: Yubin Kim, Dong Won Lee, Paul Pu Liang, Sharifa Algohwinem, Cynthia Breazeal, Hae Won Park

    Abstract: Accurately modeling affect dynamics, which refers to the changes and fluctuations in emotions and affective displays during human conversations, is crucial for understanding human interactions. By analyzing affect dynamics, we can gain insights into how people communicate, respond to different situations, and form relationships. However, modeling affect dynamics is challenging due to contextual fa… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  20. arXiv:2302.12247  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.IT

    Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework

    Authors: Paul Pu Liang, Yun Cheng, Xiang Fan, Chun Kai Ling, Suzanne Nie, Richard Chen, Zihao Deng, Nicholas Allen, Randy Auerbach, Faisal Mahmood, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimo… ▽ More

    Submitted 10 December, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023. Code available at: https://github.com/pliang279/PID

  21. arXiv:2302.04449  [pdf, other

    cs.LG cs.AI cs.CL

    Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals

    Authors: Yue Wu, Yewen Fan, Paul Pu Liang, Amos Azaria, Yuanzhi Li, Tom M. Mitchell

    Abstract: High sample complexity has long been a challenge for RL. On the other hand, humans learn to perform tasks not only from interaction or demonstrations, but also by reading unstructured text documents, e.g., instruction manuals. Instruction manuals and wiki pages are among the most abundant data that could inform agents of valuable features and policies or task-specific environmental dynamics and re… ▽ More

    Submitted 26 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

  22. arXiv:2212.10549  [pdf, other

    cs.CL cs.CV cs.LG

    Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment

    Authors: Rohan Pandey, Rulin Shao, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Despite recent progress towards scaling up multimodal vision-language models, these models are still known to struggle on compositional generalization benchmarks such as Winoground. We find that a critical component lacking from current vision-language models is relation-level alignment: the ability to match directional semantic relations in text (e.g., "mug in grass") with spatial relationships i… ▽ More

    Submitted 4 July, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  23. Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

    Authors: Xiang Fan, Yiwei Lyu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categ… ▽ More

    Submitted 22 September, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

    Comments: Accepted to ACL Findings 2023

  24. arXiv:2210.04714  [pdf, other

    cs.CL cs.LG stat.ML

    Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis

    Authors: Yuxin Xiao, Paul Pu Liang, Umang Bhatt, Willie Neiswanger, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when w… ▽ More

    Submitted 14 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP 2022 (Findings)

  25. arXiv:2209.03430  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

    Authors: Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency

    Abstract: Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, tex… ▽ More

    Submitted 20 February, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

  26. arXiv:2208.08080  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.MM

    Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides

    Authors: Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency

    Abstract: Lecture slide presentations, a sequence of pages that contain text and figures accompanied by speech, are constructed and presented carefully in order to optimally transfer knowledge to students. Previous studies in multimedia and psychology attribute the effectiveness of lecture presentations to their multimodal nature. As a step toward developing AI to aid in student learning as intelligent teac… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 9 pages, 5 figures

  27. arXiv:2208.01036  [pdf, other

    cs.LG cs.AI cs.CV

    Face-to-Face Contrastive Learning for Social Intelligence Question-Answering

    Authors: Alex Wilf, Martin Q. Ma, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency

    Abstract: Creating artificial social intelligence - algorithms that can understand the nuances of multi-person interactions - is an exciting and emerging challenge in processing facial expressions and gestures from multimodal videos. Recent multimodal methods have set the state of the art on many tasks, but have difficulty modeling the complex face-to-face conversational dynamics across speaking turns in so… ▽ More

    Submitted 27 October, 2022; v1 submitted 29 July, 2022; originally announced August 2022.

  28. arXiv:2207.00056  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiViz: Towards Visualizing and Understanding Multimodal Models

    Authors: Paul Pu Liang, Yiwei Lyu, Gunjan Chhablani, Nihal Jain, Zihao Deng, Xingbo Wang, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: The promise of multimodal models for real-world applications has inspired research in visualizing and understanding their internal mechanics with the end goal of empowering stakeholders to visualize model behavior, perform model debugging, and promote trust in machine learning models. However, modern multimodal models are typically black-box neural networks, which makes it challenging to understan… ▽ More

    Submitted 6 March, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

    Comments: ICLR 2023. Code available at: https://github.com/pliang279/MultiViz

  29. arXiv:2206.11249  [pdf, other

    cs.CL cs.AI cs.LG

    GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

    Authors: Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter , et al. (52 additional authors not shown)

    Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, an… ▽ More

    Submitted 24 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  30. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  31. arXiv:2205.00001  [pdf, other

    cs.AI cs.CL cs.LG

    Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness

    Authors: Paul Pu Liang

    Abstract: Having a rich multimodal inner language is an important component of human intelligence that enables several necessary core cognitive functions such as multimodal prediction, translation, and generation. Building upon the Conscious Turing Machine (CTM), a machine model for consciousness proposed by Blum and Blum (2021), we describe the desiderata of a multimodal language called Brainish, comprisin… ▽ More

    Submitted 6 July, 2022; v1 submitted 13 April, 2022; originally announced May 2022.

    Comments: Annual Meeting of the Association for the Scientific Study of Consciousness 2022

  32. arXiv:2203.11130  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

    Authors: Samuel Yu, Peter Wu, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: In order for AI to be safely deployed in real-world scenarios such as hospitals, schools, and the workplace, it must be able to robustly reason about the physical world. Fundamental to this reasoning is physical common sense: understanding the physical properties and affordances of available objects, how they can be manipulated, and how they interact with other objects. Physical commonsense reason… ▽ More

    Submitted 1 August, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: ECCV 2022, 51 pages, 23 figures, 4 tables

  33. arXiv:2203.02013  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations

    Authors: Yiwei Lyu, Paul Pu Liang, Zihao Deng, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: The ability for a human to understand an Artificial Intelligence (AI) model's decision-making process is critical in enabling stakeholders to visualize model behavior, perform model debugging, promote trust in AI models, and assist in collaborative human-AI decision-making. As a result, the research fields of interpretable and explainable AI have gained traction within AI communities as well as in… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: Code available at https://github.com/lvyiwei1/DIME

  34. arXiv:2203.01311  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning

    Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots. While there has been an explosion of interest in multimodal learning, these methods are focused on a small set of modalities primarily in language, vision, and audio. In order to accelerate generalization towards… ▽ More

    Submitted 28 June, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: TMLR 2023, Code available at https://github.com/pliang279/HighMMT

  35. arXiv:2107.07502  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

    Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A. Lee, Yuke Zhu, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and mod… ▽ More

    Submitted 10 November, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021 Datasets and Benchmarks Track. Code: https://github.com/pliang279/MultiBench and Website: https://cmu-multicomp-lab.github.io/multibench/

  36. arXiv:2106.13219  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Towards Understanding and Mitigating Social Biases in Language Models

    Authors: Paul Pu Liang, Chiyu Wu, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: As machine learning methods are deployed in real-world settings such as healthcare, legal systems, and social science, it is crucial to recognize how they shape social biases and stereotypes in these sensitive decision-making processes. Among such real-world deployments are large-scale pretrained language models (LMs) that can be potentially dangerous in manifesting undesirable representational bi… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: ICML 2021, code available at https://github.com/pliang279/LM_bias

  37. arXiv:2106.13213  [pdf, other

    cs.LG cs.AI cs.CL cs.HC

    Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data

    Authors: Paul Pu Liang, Terrance Liu, Anna Cai, Michal Muszynski, Ryo Ishii, Nicholas Allen, Randy Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications for the early detection, intervention, and treatment of mental health disorders. One promising data source to help monitor human behavior is daily smartphone usage. Howeve… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: ACL 2021. arXiv admin note: substantial text overlap with arXiv:2012.02359

  38. arXiv:2106.06047  [pdf, other

    cs.LG cs.CV

    Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning

    Authors: Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Ehsan Adeli, Li Fei-Fei, Daniel Rubin

    Abstract: Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate t… ▽ More

    Submitted 13 April, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Published as a conference paper at CVPR 2022

  39. arXiv:2106.02866  [pdf, other

    cs.LG

    Conditional Contrastive Learning for Improving Fairness in Self-Supervised Learning

    Authors: Martin Q. Ma, Yao-Hung Hubert Tsai, Paul Pu Liang, Han Zhao, Kun Zhang, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Contrastive self-supervised learning (SSL) learns an embedding space that maps similar data pairs closer and dissimilar data pairs farther apart. Despite its success, one issue has been overlooked: the fairness aspect of representations learned using contrastive SSL. Without mitigation, contrastive SSL techniques can incorporate sensitive information such as gender or race and cause potentially un… ▽ More

    Submitted 27 June, 2022; v1 submitted 5 June, 2021; originally announced June 2021.

  40. arXiv:2104.11902  [pdf, other

    cs.LG cs.AI cs.CL

    Ask & Explore: Grounded Question Answering for Curiosity-Driven Exploration

    Authors: Jivat Neet Kaur, Yiding Jiang, Paul Pu Liang

    Abstract: In many real-world scenarios where extrinsic rewards to the agent are extremely sparse, curiosity has emerged as a useful concept providing intrinsic rewards that enable the agent to explore its environment and acquire information to achieve its goals. Despite their strong performance on many sparse-reward tasks, existing curiosity approaches rely on an overly holistic view of state transitions, a… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: Accepted at ICLR 2021 Workshop on Embodied Multimodal Learning

  41. arXiv:2104.05196  [pdf, other

    cs.CL cs.AI cs.LG

    StylePTB: A Compositional Benchmark for Fine-grained Controllable Text Style Transfer

    Authors: Yiwei Lyu, Paul Pu Liang, Hai Pham, Eduard Hovy, Barnabás Póczos, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Text style transfer aims to controllably generate text with targeted stylistic changes while maintaining core meaning from the source sentence constant. Many of the existing style transfer benchmarks primarily focus on individual high-level semantic changes (e.g. positive to negative), which enable controllability at a high level but do not offer fine-grained control involving sentence structure,… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: NAACL 2021, code available at https://github.com/lvyiwei1/StylePTB/

  42. arXiv:2101.08919  [pdf, other

    eess.AS cs.CR cs.SD

    Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks

    Authors: Peter Wu, Paul Pu Liang, Jiatong Shi, Ruslan Salakhutdinov, Shinji Watanabe, Louis-Philippe Morency

    Abstract: As users increasingly rely on cloud-based computing services, it is important to ensure that uploaded speech data remains private. Existing solutions rely either on server-side methods or focus on hiding speaker identity. While these approaches reduce certain security concerns, they do not give users client-side control over whether their biometric information is sent to the server. In this paper,… ▽ More

    Submitted 22 October, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

  43. arXiv:2012.02813  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

    Authors: Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities are present at train and test time, which makes learning in low-resource modalities particularly difficult. In this work, we propose algorithms for cross-modal ge… ▽ More

    Submitted 4 December, 2020; originally announced December 2020.

  44. arXiv:2012.02359  [pdf, other

    cs.LG cs.CY stat.AP

    Multimodal Privacy-preserving Mood Prediction from Mobile Data: A Preliminary Study

    Authors: Terrance Liu, Paul Pu Liang, Michal Muszynski, Ryo Ishii, David Brent, Randy Auerbach, Nicholas Allen, Louis-Philippe Morency

    Abstract: Mental health conditions remain under-diagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications towards the early detection and intervention of mental health disorders. One promising data source to help monitor human behavior is from daily smartphone usage. However,… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

  45. arXiv:2010.12648  [pdf, other

    cs.LG stat.ML

    An Investigation of how Label Smoothing Affects Generalization

    Authors: Blair Chen, Liu Ziyin, Zihao Wang, Paul Pu Liang

    Abstract: It has been hypothesized that label smoothing can reduce overfitting and improve generalization, and current empirical evidence seems to corroborate these effects. However, there is a lack of mathematical understanding of when and why such empirical improvements occur. In this paper, as a step towards understanding why label smoothing is effective, we propose a theoretical framework to show how la… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  46. arXiv:2007.08100  [pdf, other

    cs.CL cs.LG

    Towards Debiasing Sentence Representations

    Authors: Paul Pu Liang, Irene Mengze Li, Emily Zheng, Yao Chong Lim, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: As natural language processing methods are increasingly deployed in real-world scenarios such as healthcare, legal systems, and social science, it becomes necessary to recognize the role they potentially play in shaping social biases and stereotypes. Previous work has revealed the presence of social biases in widely used word embeddings involving gender, race, religion, and other social constructs… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

    Comments: ACL 2020, code available at https://github.com/pliang279/sent_debias

  47. arXiv:2003.08197  [pdf, other

    cs.LG cs.CL stat.ML

    Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

    Authors: Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed

    Abstract: Learning continuous representations of discrete objects such as text, users, movies, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g., natural groupings and similarities) and embed the objects independently into individual vectors. As a result, existing meth… ▽ More

    Submitted 11 March, 2021; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: ICLR 2021, code can be found at http://github.com/pliang279/sparse_discrete

  48. arXiv:2003.03212  [pdf, other

    cs.CV

    Diverse and Admissible Trajectory Forecasting through Multimodal Context Understanding

    Authors: Seong Hyeon Park, Gyubok Lee, Manoj Bhat, Jimin Seo, Minseok Kang, Jonathan Francis, Ashwin R. Jadhav, Paul Pu Liang, Louis-Philippe Morency

    Abstract: Multi-agent trajectory forecasting in autonomous driving requires an agent to accurately anticipate the behaviors of the surrounding vehicles and pedestrians, for safe and reliable decision-making. Due to partial observability in these dynamical scenes, directly obtaining the posterior distribution over future agent trajectories remains a challenging problem. In realistic embodied environments, ea… ▽ More

    Submitted 31 August, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

    Comments: ECCV 2020

  49. arXiv:2003.01848  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    On Emergent Communication in Competitive Multi-Agent Teams

    Authors: Paul Pu Liang, Jeffrey Chen, Ruslan Salakhutdinov, Louis-Philippe Morency, Satwik Kottur

    Abstract: Several recent works have found the emergence of grounded compositional language in the communication protocols developed by mostly cooperative multi-agent systems when learned end-to-end to maximize performance on a downstream task. However, human populations learn to solve complex tasks involving communicative behaviors not only in fully cooperative settings but also in scenarios where competiti… ▽ More

    Submitted 16 July, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: AAMAS 2020, code: https://github.com/pliang279/Competitive-Emergent-Communication

  50. arXiv:2002.06541  [pdf, other

    cs.LG cs.IT stat.ML

    Learning Not to Learn in the Presence of Noisy Labels

    Authors: Liu Ziyin, Blair Chen, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

    Abstract: Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets. In this paper, we discover that a new class of loss functions called the gambler's loss provides strong robustness to label noise across various levels of corruption. We show that training with this loss function encourages the model to… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.