Skip to main content

Showing 1–21 of 21 results for author: Garrette, D

  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2311.08273  [pdf, other

    cs.CL

    Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks

    Authors: Rochelle Choenni, Ekaterina Shutova, Dan Garrette

    Abstract: Recent work has proposed explicitly inducing language-wise modularity in multilingual LMs via sparse fine-tuning (SFT) on per-language subnetworks as a means of better guiding cross-lingual sharing. In this work, we investigate (1) the degree to which language-wise modularity naturally arises within models with no special modularity interventions, and (2) how cross-lingual sharing and interference… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  4. arXiv:2310.19956  [pdf, other

    cs.CL

    The Impact of Depth on Compositional Generalization in Transformer Language Models

    Authors: Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen

    Abstract: To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by theoretical and empirical work, that deeper transformers generalize more compositionally. Simply adding layers increases the total number o… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NAACL 2024

  5. arXiv:2305.13286  [pdf, other

    cs.CL

    How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning

    Authors: Rochelle Choenni, Dan Garrette, Ekaterina Shutova

    Abstract: Multilingual large language models (MLLMs) are jointly trained on data from many different languages such that representation of individual languages can benefit from other languages' data. Impressive performance on zero-shot cross-lingual transfer shows that these models are capable of exploiting data from other languages. Yet, it remains unclear to what extent, and under which conditions, langua… ▽ More

    Submitted 21 May, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

  6. XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

    Authors: Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson , et al. (2 additional authors not shown)

    Abstract: Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot;… ▽ More

    Submitted 24 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  7. arXiv:2212.10562  [pdf, other

    cs.CL cs.CV

    Character-Aware Models Improve Visual Text Rendering

    Authors: Rosanne Liu, Dan Garrette, Chitwan Saharia, William Chan, Adam Roberts, Sharan Narang, Irina Blok, RJ Mical, Mohammad Norouzi, Noah Constant

    Abstract: Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify this effect, we conduct a series of experiments comparing character-aware vs. character-blind text e… ▽ More

    Submitted 3 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  8. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  9. arXiv:2211.00922  [pdf, other

    cs.CL

    Dialect-robust Evaluation of Generated Text

    Authors: Jiao Sun, Thibault Sellam, Elizabeth Clark, Tu Vu, Timothy Dozat, Dan Garrette, Aditya Siddhant, Jacob Eisenstein, Sebastian Gehrmann

    Abstract: Evaluation metrics that are not robust to dialect variation make it impossible to tell how well systems perform for many groups of users, and can even penalize systems for producing text in lower-resource dialects. However, currently, there exists no way to quantify how metrics respond to change in the dialect of a generated utterance. We thus formalize dialect robustness and dialect awareness as… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  10. arXiv:2211.00106  [pdf, other

    cs.CL

    Data-Efficient Cross-Lingual Transfer with Language-Specific Subnetworks

    Authors: Rochelle Choenni, Dan Garrette, Ekaterina Shutova

    Abstract: Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer, but learning can also be hindered when training updates from different languages are in conflict. In this paper, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

  11. arXiv:2210.00193  [pdf, other

    cs.CL

    FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

    Authors: Parker Riley, Timothy Dozat, Jan A. Botha, Xavier Garcia, Dan Garrette, Jason Riesa, Orhan Firat, Noah Constant

    Abstract: We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distr… ▽ More

    Submitted 3 October, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

    Comments: Published in TACL Vol. 11 (2023)

  12. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  13. arXiv:2203.17189  [pdf, other

    cs.LG cs.CL

    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

    Authors: Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen , et al. (18 additional authors not shown)

    Abstract: Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we presen… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

  14. arXiv:2109.07020  [pdf, other

    cs.CL

    Frequency Effects on Syntactic Rule Learning in Transformers

    Authors: Jason Wei, Dan Garrette, Tal Linzen, Ellie Pavlick

    Abstract: Pre-trained language models perform well on a variety of linguistic tasks that require symbolic reasoning, raising the question of whether such models implicitly represent abstract symbols and rules. We investigate this question using the case study of BERT's performance on English subject-verb agreement. Unlike prior work, we train multiple instances of BERT from scratch, allowing us to perform a… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Camera ready for EMNLP 2021

  15. arXiv:2104.07412  [pdf, other

    cs.CL cs.AI

    XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation

    Authors: Sebastian Ruder, Noah Constant, Jan Botha, Aditya Siddhant, Orhan Firat, Jinlan Fu, Pengfei Liu, Junjie Hu, Dan Garrette, Graham Neubig, Melvin Johnson

    Abstract: Machine learning has brought striking advances in multilingual natural language processing capabilities over the past year. For example, the latest techniques have improved the state-of-the-art performance on the XTREME multilingual benchmark by more than 13 points. While a sizeable gap to human-level performance remains, improvements have been easier to achieve in some tasks than in others. This… ▽ More

    Submitted 7 October, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021 camera-ready

  16. CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

    Authors: Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting

    Abstract: Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons are less brittle than manually engineered tokenizers, these techniques are not equally suited to all languages, and the use of any fixed vocabulary may limit a m… ▽ More

    Submitted 18 May, 2022; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: TACL Final Version

    Journal ref: Transactions of the Association for Computational Linguistics (2022) 10: 73--91

  17. arXiv:2010.12777  [pdf, other

    cs.CL cs.LG

    Improving Multilingual Models with Language-Clustered Vocabularies

    Authors: Hyung Won Chung, Dan Garrette, Kiat Chuan Tan, Jason Riesa

    Abstract: State-of-the-art multilingual models depend on vocabularies that cover all of the languages the model will expect to see at inference time, but the standard methods for generating those vocabularies are not ideal for massively multilingual applications. In this work, we introduce a novel procedure for multilingual vocabulary generation that combines the separately trained vocabularies of several a… ▽ More

    Submitted 24 October, 2020; originally announced October 2020.

    Comments: Published in the main conference of EMNLP 2020

  18. arXiv:2003.05002  [pdf

    cs.CL cs.LG

    TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

    Authors: Jonathan H. Clark, Eunsol Choi, Michael Collins, Dan Garrette, Tom Kwiatkowski, Vitaly Nikolaev, Jennimaria Palomaki

    Abstract: Confidently making progress on multilingual modeling requires challenging, trustworthy evaluations. We present TyDi QA---a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology---the set of linguistic features each language expresses---such that we expect models performing well on t… ▽ More

    Submitted 10 March, 2020; originally announced March 2020.

    Comments: To appear in Transactions of the Association for Computational Linguistics (TACL) 2020. Please use this as the citation

  19. arXiv:1906.01502  [pdf, other

    cs.CL cs.AI cs.LG

    How multilingual is Multilingual BERT?

    Authors: Telmo Pires, Eva Schlinger, Dan Garrette

    Abstract: In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number o… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

  20. arXiv:1704.07875  [pdf, other

    cs.CL

    Automatic Compositor Attribution in the First Folio of Shakespeare

    Authors: Maria Ryskina, Hannah Alpert-Abrams, Dan Garrette, Taylor Berg-Kirkpatrick

    Abstract: Compositor attribution, the clustering of pages in a historical printed document by the individual who set the type, is a bibliographic task that relies on analysis of orthographic variation and inspection of visual details of the printed page. In this paper, we introduce a novel unsupervised model that jointly describes the textual and visual features needed to distinguish compositors. Applied to… ▽ More

    Submitted 25 April, 2017; originally announced April 2017.

    Comments: Short paper (6 pages) accepted at ACL 2017

  21. arXiv:1701.03980  [pdf, other

    stat.ML cs.CL cs.MS

    DyNet: The Dynamic Neural Network Toolkit

    Authors: Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

    Abstract: We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its deriva… ▽ More

    Submitted 14 January, 2017; originally announced January 2017.

    Comments: 33 pages