Skip to main content

Showing 1–12 of 12 results for author: Tam, D

  1. arXiv:2312.04339  [pdf, other

    cs.LG cs.CL

    Merging by Matching Models in Task Parameter Subspaces

    Authors: Derek Tam, Mohit Bansal, Colin Raffel

    Abstract: Model merging aims to cheaply combine individual task-specific models into a single multitask model. In this work, we view past merging methods as leveraging different notions of a ''task parameter subspace'' in which models are matched before being merged. We connect the task parameter subspace of a given model to its loss landscape and formalize how this approach to model merging can be seen as… ▽ More

    Submitted 13 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: TMLR

  2. arXiv:2306.01708  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    TIES-Merging: Resolving Interference When Merging Models

    Authors: Prateek Yadav, Derek Tam, Leshem Choshen, Colin Raffel, Mohit Bansal

    Abstract: Transfer learning - i.e., further fine-tuning a pre-trained model on a downstream task - can confer significant advantages, including improved downstream performance, faster convergence, and better sample efficiency. These advantages have led to a proliferation of task-specific fine-tuned models, which typically can only perform a single task and do not benefit from one another. Recently, model me… ▽ More

    Submitted 26 October, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

    Comments: Published at NeurIPS 2023, 23 Pages, 13 Figures, 14 Tables

  3. arXiv:2211.08412  [pdf, other

    cs.CL

    Evaluating the Factual Consistency of Large Language Models Through News Summarization

    Authors: Derek Tam, Anisha Mascarenhas, Shiyue Zhang, Sarah Kwan, Mohit Bansal, Colin Raffel

    Abstract: While large language models (LLMs) have proven to be effective on a large variety of tasks, they are also known to hallucinate information. To measure whether an LLM prefers factually consistent continuations of its input, we propose a new benchmark called FIB(Factual Inconsistency Benchmark) that focuses on the task of summarization. Specifically, our benchmark involves comparing the scores an LL… ▽ More

    Submitted 2 December, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  4. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  5. arXiv:2205.05638  [pdf, other

    cs.LG cs.AI cs.CL

    Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

    Authors: Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel

    Abstract: Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning… ▽ More

    Submitted 26 August, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

  6. arXiv:2112.08548  [pdf, other

    cs.CL

    Isochrony-Aware Neural Machine Translation for Automatic Dubbing

    Authors: Derek Tam, Surafel M. Lakew, Yogesh Virkar, Prashant Mathur, Marcello Federico

    Abstract: We introduce the task of isochrony-aware machine translation which aims at generating translations suitable for dubbing. Dubbing of a spoken sentence requires transferring the content as well as the speech-pause structure of the source into the target language to achieve audiovisual coherence. Practically, this implies correctly projecting pauses from the source to the target and ensuring that tar… ▽ More

    Submitted 8 July, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    Comments: Published at InterSpeech 2022 (https://interspeech2022.org) - scheduled for September 18-22 2022, Incheon Korea

  7. arXiv:2106.07499  [pdf, other

    cs.CL cs.AI

    An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

    Authors: Jiaao Chen, Derek Tam, Colin Raffel, Mohit Bansal, Diyi Yang

    Abstract: NLP has achieved great progress in the past decade through the use of neural models and large labeled datasets. The dependence on abundant data prevents NLP models from being applied to low-resource settings or novel tasks where significant time, money, or expertise is required to label massive amounts of textual data. Recently, data augmentation methods have been explored as a means of improving… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  8. arXiv:2103.11955  [pdf, other

    cs.CL cs.AI cs.LG

    Improving and Simplifying Pattern Exploiting Training

    Authors: Derek Tam, Rakesh R Menon, Mohit Bansal, Shashank Srivastava, Colin Raffel

    Abstract: Recently, pre-trained language models (LMs) have achieved strong performance when fine-tuned on difficult benchmarks like SuperGLUE. However, performance can suffer when there are very few labeled examples available for fine-tuning. Pattern Exploiting Training (PET) is a recent approach that leverages patterns for few-shot learning. However, PET uses task-specific unlabeled data. In this paper, we… ▽ More

    Submitted 28 September, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: EMNLP 2021 (12 pages, 2 figures)

  9. arXiv:2009.05266  [pdf, other

    cs.LG stat.ML

    GTEA: Inductive Representation Learning on Temporal Interaction Graphs via Temporal Edge Aggregation

    Authors: Siyue Xie, Yiming Li, Da Sun Handason Tam, Xiaxin Liu, Qiu Fang Ying, Wing Cheong Lau, Dah Ming Chiu, Shou Zhi Chen

    Abstract: In this paper, we propose the Graph Temporal Edge Aggregation (GTEA) framework for inductive learning on Temporal Interaction Graphs (TIGs). Different from previous works, GTEA models the temporal dynamics of interaction sequences in the continuous-time space and simultaneously takes advantage of both rich node and edge/ interaction attributes in the graph. Concretely, we integrate a sequence mode… ▽ More

    Submitted 3 May, 2023; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: accepted by PAKDD2023

  10. arXiv:1907.10165  [pdf, other

    cs.LG cs.CL stat.ML

    Optimal Transport-based Alignment of Learned Character Representations for String Similarity

    Authors: Derek Tam, Nicholas Monath, Ari Kobren, Aaron Traylor, Rajarshi Das, Andrew McCallum

    Abstract: String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE --a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. W… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: ACL Long Paper

  11. arXiv:1906.05546  [pdf, ps, other

    cs.SI cs.LG

    Identifying Illicit Accounts in Large Scale E-payment Networks -- A Graph Representation Learning Approach

    Authors: Da Sun Handason Tam, Wing Cheong Lau, Bin Hu, Qiu Fang Ying, Dah Ming Chiu, Hong Liu

    Abstract: Rapid and massive adoption of mobile/ online payment services has brought new challenges to the service providers as well as regulators in safeguarding the proper uses such services/ systems. In this paper, we leverage recent advances in deep-neural-network-based graph representation learning to detect abnormal/ suspicious financial transactions in real-world e-payment networks. In particular, we… ▽ More

    Submitted 13 June, 2019; originally announced June 2019.

  12. arXiv:1905.12957  [pdf, other

    cs.IT cs.LG

    Neural Entropic Estimation: A faster path to mutual information estimation

    Authors: Chung Chan, Ali Al-Bashabsheh, Hing Pang Huang, Michael Lim, Da Sun Handason Tam, Chao Zhao

    Abstract: We point out a limitation of the mutual information neural estimation (MINE) where the network fails to learn at the initial training phase, leading to slow convergence in the number of training iterations. To solve this problem, we propose a faster method called the mutual information neural entropic estimation (MI-NEE). Our solution first generalizes MINE to estimate the entropy using a custom r… ▽ More

    Submitted 30 May, 2019; v1 submitted 30 May, 2019; originally announced May 2019.