Skip to main content

Showing 1–22 of 22 results for author: Kruszewski, G

  1. arXiv:2310.13011  [pdf, other

    cs.CL cs.LG

    Compositional preference models for aligning LMs

    Authors: Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Marc Dymetman

    Abstract: As language models (LMs) become more capable, it is increasingly important to align them with human preferences. However, the dominant paradigm for training Preference Models (PMs) for that purpose suffers from fundamental limitations, such as lack of transparency and scalability, along with susceptibility to overfitting the preference dataset. We propose Compositional Preference Models (CPMs), a… ▽ More

    Submitted 14 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  2. arXiv:2306.17757  [pdf, other

    cs.CL

    Should you marginalize over possible tokenizations?

    Authors: Nadezhda Chirkova, Germán Kruszewski, Jos Rozen, Marc Dymetman

    Abstract: Autoregressive language models (LMs) map token sequences to probabilities. The usual practice for computing the probability of any character string (e.g. English sentences) is to first transform it into a sequence of tokens that is scored by the model. However, there are exponentially many token sequences that represent any given string. To truly compute the probability of a string one should marg… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023

  3. arXiv:2303.05431  [pdf, other

    cs.CL cs.AI cs.LG

    disco: a toolkit for Distributional Control of Generative Models

    Authors: Germán Kruszewski, Jos Rozen, Marc Dymetman

    Abstract: Pre-trained language models and other generative models have revolutionized NLP and beyond. However, these models tend to reproduce undesirable biases present in their training data. Also, they may overlook patterns that are important but challenging to capture. To address these limitations, researchers have introduced distributional control techniques. These techniques, not limited to language, a… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  4. arXiv:2302.08215  [pdf, other

    cs.CL cs.LG stat.ML

    Aligning Language Models with Preferences through f-divergence Minimization

    Authors: Dongyoung Go, Tomasz Korbak, Germán Kruszewski, Jos Rozen, Nahyeon Ryu, Marc Dymetman

    Abstract: Aligning language models with preferences can be posed as approximating a target distribution representing some desired behavior. Existing approaches differ both in the functional form of the target distribution and the algorithm used to approximate it. For instance, Reinforcement Learning from Human Feedback (RLHF) corresponds to minimizing a reverse KL from an implicit target distribution arisin… ▽ More

    Submitted 6 June, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

  5. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  6. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  7. arXiv:2206.00761  [pdf, other

    cs.LG cs.CL stat.ML

    On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

    Authors: Tomasz Korbak, Hady Elsahar, Germán Kruszewski, Marc Dymetman

    Abstract: The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a training-from-scratch to a fine-tuning paradigm. While in some applications the goal is to "nudge" the pre-trained distribution towards preferred outputs, in others it is to steer it towards a different distribution over the sample space. Two main paradigms have emerged t… ▽ More

    Submitted 14 November, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

  8. arXiv:2112.05702  [pdf, other

    cs.LG cs.CL cs.NE

    Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs

    Authors: Bryan Eikema, Germán Kruszewski, Hady Elsahar, Marc Dymetman

    Abstract: Energy-Based Models (EBMs) allow for extremely flexible specifications of probability distributions. However, they do not provide a mechanism for obtaining exact samples from these distributions. Monte Carlo techniques can aid us in obtaining samples if some proposal distribution that we can easily sample from is available. For instance, rejection sampling can provide exact samples but is often di… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  9. arXiv:2112.00791  [pdf, other

    cs.LG cs.CL

    Controlling Conditional Language Models without Catastrophic Forgetting

    Authors: Tomasz Korbak, Hady Elsahar, German Kruszewski, Marc Dymetman

    Abstract: Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g., hallucinations in abstractive summarization or style violations in c… ▽ More

    Submitted 20 June, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: ICML 2022

  10. arXiv:2111.02878  [pdf, other

    cs.CL cs.IR

    Unsupervised and Distributional Detection of Machine-Generated Text

    Authors: Matthias Gallé, Jos Rozen, Germán Kruszewski, Hady Elsahar

    Abstract: The power of natural language generation models has provoked a flurry of interest in automatic methods to detect if a piece of text is human or machine-authored. The problem so far has been framed in a standard supervised way and consists in training a classifier on annotated data to predict the origin of one given new document. In this paper, we frame the problem in an unsupervised and distributi… ▽ More

    Submitted 4 November, 2021; originally announced November 2021.

    Comments: 10 pages

  11. arXiv:2106.04985  [pdf, other

    cs.LG cs.CL cs.NE cs.SE

    Energy-Based Models for Code Generation under Compilability Constraints

    Authors: Tomasz Korbak, Hady Elsahar, Marc Dymetman, Germán Kruszewski

    Abstract: Neural language models can be successfully trained on source code, leading to applications such as code completion. However, their versatile autoregressive self-supervision objective overlooks important global sequence-level features that are present in the data such as syntactic correctness or compilability. In this work, we pose the problem of learning to generate compilable code as constraint s… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted for the First Workshop on Natural Language Processing for Programming, ACL 2021

    ACM Class: I.2.2; I.2.7; I.2.6; I.5.1

  12. arXiv:2103.08245  [pdf, other

    nlin.AO cs.NE

    Emergence of Self-Reproducing Metabolisms as Recursive Algorithms in an Artificial Chemistry

    Authors: Germán Kruszewski, Tomas Mikolov

    Abstract: One of the main goals of Artificial Life is to research the conditions for the emergence of life, not necessarily as it is, but as it could be. Artificial Chemistries are one of the most important tools for this purpose because they provide us with a basic framework to investigate under which conditions metabolisms capable of reproducing themselves, and ultimately, of evolving, can emerge. While t… ▽ More

    Submitted 7 December, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

    Comments: arXiv admin note: text overlap with arXiv:2003.07916

  13. arXiv:2004.03340  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Online Continual Learning with CALM

    Authors: Germán Kruszewski, Ionut-Teodor Sorodoc, Tomas Mikolov

    Abstract: Online Continual Learning (OCL) studies learning over a continuous data stream without observing any single example more than once, a setting that is closer to the experience of humans and systems that must learn "on-the-wild". Yet, commonly available benchmarks are far from these real-world conditions, because they explicitly signal different tasks, lack latent similarity structure or assume temp… ▽ More

    Submitted 1 February, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

  14. arXiv:2003.07916  [pdf, other

    nlin.AO cs.NE q-bio.MN

    Combinatory Chemistry: Towards a Simple Model of Emergent Evolution

    Authors: Germán Kruszewski, Tomas Mikolov

    Abstract: An explanatory model for the emergence of evolvable units must display emerging structures that (1) preserve themselves in time (2) self-reproduce and (3) tolerate a certain amount of variation when reproducing. To tackle this challenge, here we introduce Combinatory Chemistry, an Algorithmic Artificial Chemistry based on a minimalistic computational paradigm named Combinatory Logic. The dynamics… ▽ More

    Submitted 19 June, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

  15. arXiv:1903.07435  [pdf, other

    cs.CL

    The emergence of number and syntax units in LSTM language models

    Authors: Yair Lakretz, German Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, Marco Baroni

    Abstract: Recent work has shown that LSTMs trained on a generic language modeling objective capture syntax-sensitive generalizations such as long-distance number agreement. We have however no mechanistic understanding of how they accomplish this remarkable feat. Some have conjectured it depends on heuristics that do not truly take hierarchical structure into account. We present here a detailed study of the… ▽ More

    Submitted 2 April, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

    Comments: To appear in Proceedings of NAACL, Minneapolis, MN, 2019

  16. arXiv:1902.09393  [pdf, other

    cs.CL cs.AI cs.LG

    Cooperative Learning of Disjoint Syntax and Semantics

    Authors: Serhii Havrylov, Germán Kruszewski, Armand Joulin

    Abstract: There has been considerable attention devoted to models that learn to jointly infer an expression's syntactic structure and its semantics. Yet, \citet{NangiaB18} has recently shown that the current best systems fail to learn the correct parsing strategy on mathematical expressions generated from a simple context-free grammar. In this work, we present a recursive model inspired by \newcite{ChoiYL18… ▽ More

    Submitted 29 May, 2019; v1 submitted 25 February, 2019; originally announced February 2019.

    Comments: The paper was accepted at NAACL-HLT 2019

  17. arXiv:1809.06194  [pdf, other

    cs.CL

    The Fast and the Flexible: training neural networks to learn to follow instructions from small data

    Authors: Rezka Leonandya, Elia Bruni, Dieuwke Hupkes, Germán Kruszewski

    Abstract: Learning to follow human instructions is a long-pursued goal in artificial intelligence. The task becomes particularly challenging if no prior knowledge of the employed language is assumed while relying only on a handful of examples to learn from. Work in the past has relied on hand-coded components or manually engineered features to provide strong inductive biases that make learning in such situa… ▽ More

    Submitted 2 April, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

  18. arXiv:1805.09657  [pdf, other

    cs.CL cs.AI cs.LG

    Learning compositionally through attentive guidance

    Authors: Dieuwke Hupkes, Anand Singh, Kris Korrel, German Kruszewski, Elia Bruni

    Abstract: While neural network models have been successfully applied to domains that require substantial generalisation skills, recent studies have implied that they struggle when solving the task they are trained on requires inferring its underlying compositional structure. In this paper, we introduce Attentive Guidance, a mechanism to direct a sequence to sequence model equipped with attention to find mor… ▽ More

    Submitted 5 July, 2019; v1 submitted 20 May, 2018; originally announced May 2018.

  19. arXiv:1805.01070  [pdf, other

    cs.CL

    What you can cram into a single vector: Probing sentence embeddings for linguistic properties

    Authors: Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni

    Abstract: Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. "Downstream" tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the repres… ▽ More

    Submitted 8 July, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  20. arXiv:1802.06467  [pdf, other

    cs.AI cs.LG cs.NE

    Memorize or generalize? Searching for a compositional RNN in a haystack

    Authors: Adam Liška, Germán Kruszewski, Marco Baroni

    Abstract: Neural networks are very powerful learning systems, but they do not readily generalize from one task to the other. This is partly due to the fact that they do not learn in a compositional way, that is, by discovering skills that are shared by different tasks, and recombining them to solve new problems. In this paper, we explore the compositional generalization capabilities of recurrent neural netw… ▽ More

    Submitted 25 July, 2018; v1 submitted 18 February, 2018; originally announced February 2018.

    Comments: AEGAP Workshop (ICML 2018)

  21. arXiv:1701.08954  [pdf, ps, other

    cs.LG cs.AI cs.CL

    CommAI: Evaluating the first steps towards a useful general AI

    Authors: Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov

    Abstract: With machine learning successfully applied to new daunting problems almost every day, general AI starts looking like an attainable goal. However, most current research focuses instead on important but narrow applications, such as image classification or machine translation. We believe this to be largely due to the lack of objective ways to measure progress towards broad machine intelligence. In or… ▽ More

    Submitted 27 March, 2017; v1 submitted 31 January, 2017; originally announced January 2017.

    Comments: Published in ICLR 2017 Workshop Track

  22. arXiv:1606.06031  [pdf, other

    cs.CL cs.AI cs.LG

    The LAMBADA dataset: Word prediction requiring a broad discourse context

    Authors: Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, Raquel Fernández

    Abstract: We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAM… ▽ More

    Submitted 20 June, 2016; originally announced June 2016.

    Comments: 10 pages, Accepted as a long paper for ACL 2016