Skip to main content

Showing 1–50 of 76 results for author: Levy, O

  1. arXiv:2404.08801  [pdf, other

    cs.LG cs.CL

    Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

    Authors: Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou

    Abstract: The quadratic complexity and weak length extrapolation of Transformers limits their ability to scale to long sequences, and while sub-quadratic solutions like linear attention and state space models exist, they empirically underperform Transformers in pretraining efficiency and downstream task accuracy. We introduce Megalodon, a neural architecture for efficient sequence modeling with unlimited co… ▽ More

    Submitted 16 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures and 8 tables

  2. arXiv:2402.08451  [pdf, other

    cs.HC

    Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

    Authors: Asaf Liberman, Oron Levy, Soroush Shahi, Cori Tymoszek Park, Mike Ralph, Richard Kang, Abdelkareem Bedri, Gierad Laput

    Abstract: Personal devices have adopted diverse authentication methods, including biometric recognition and passcodes. In contrast, headphones have limited input mechanisms, depending solely on the authentication of connected devices. We present Moonwalk, a novel method for passive user recognition utilizing the built-in headphone accelerometer. Our approach centers on gait recognition; enabling users to es… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    ACM Class: H.5.2

  3. arXiv:2402.08420  [pdf, other

    cs.HC

    Vision-Based Hand Gesture Customization from a Single Demonstration

    Authors: Soroush Shahi, Cori Tymoszek Park, Richard Kang, Asaf Liberman, Oron Levy, Jun Gong, Abdelkareem Bedri, Gierad Laput

    Abstract: Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization require… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    ACM Class: H.5.2; I.4

  4. arXiv:2310.15123  [pdf, other

    cs.CL cs.AI cs.LG

    Branch-Solve-Merge Improves Large Language Model Evaluation and Generation

    Authors: Swarnadeep Saha, Omer Levy, Asli Celikyilmaz, Mohit Bansal, Jason Weston, Xian Li

    Abstract: Large Language Models (LLMs) are frequently used for multi-faceted language generation and evaluation tasks that involve satisfying intricate user constraints or taking into account multiple aspects and criteria. However, their performance can fall short, due to the model's lack of coherence and inability to plan and decompose the problem. We propose Branch-Solve-Merge (BSM), a Large Language Mode… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NAACL 2024 (19 pages, 7 figures, 11 tables)

  5. arXiv:2310.07106  [pdf, other

    cs.CL cs.AI cs.LG q-bio.NC

    The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models

    Authors: Ariel Goldstein, Eric Ham, Mariano Schain, Samuel Nastase, Zaid Zada, Avigail Dabush, Bobbi Aubrey, Harshvardhan Gazula, Amir Feder, Werner K Doyle, Sasha Devore, Patricia Dugan, Daniel Friedman, Roi Reichart, Michael Brenner, Avinatan Hassidim, Orrin Devinsky, Adeen Flinker, Omer Levy, Uri Hasson

    Abstract: Deep Language Models (DLMs) provide a novel computational paradigm for understanding the mechanisms of natural language processing in the human brain. Unlike traditional psycholinguistic models, DLMs use layered sequences of continuous numerical vectors to represent words and context, allowing a plethora of emerging applications such as human-like text generation. In this paper we show evidence th… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  6. arXiv:2308.06259  [pdf, other

    cs.CL

    Self-Alignment with Instruction Backtranslation

    Authors: Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis

    Abstract: We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus. The seed model is used to construct training examples by generating instruction prompts… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: ICLR2024 camera ready

  7. arXiv:2305.14196  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding

    Authors: Uri Shaham, Maor Ivgi, Avia Efrat, Jonathan Berant, Omer Levy

    Abstract: We introduce ZeroSCROLLS, a zero-shot benchmark for natural language understanding over long texts, which contains only test and small validation sets, without training data. We adapt six tasks from the SCROLLS benchmark, and add four new datasets, including two novel information fusing tasks, such as aggregating the percentage of positive reviews. Using ZeroSCROLLS, we conduct a comprehensive eva… ▽ More

    Submitted 17 December, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: Findings of EMNLP 2023

  8. arXiv:2305.11206  [pdf, other

    cs.CL cs.AI cs.LG

    LIMA: Less Is More for Alignment

    Authors: Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, Susan Zhang, Gargi Ghosh, Mike Lewis, Luke Zettlemoyer, Omer Levy

    Abstract: Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervis… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

  9. arXiv:2305.01569  [pdf, other

    cs.CV cs.AI

    Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

    Authors: Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, Omer Levy

    Abstract: The ability to collect a large dataset of human preferences from text-to-image users is usually limited to companies, making such datasets inaccessible to the public. To address this issue, we create a web app that enables text-to-image users to generate images and specify their preferences. Using this web app we build Pick-a-Pic, a large, open dataset of text-to-image prompts and real users' pref… ▽ More

    Submitted 23 November, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

  10. arXiv:2304.00287  [pdf, other

    cs.CV

    Vision Transformers with Mixed-Resolution Tokenization

    Authors: Tomer Ronen, Omer Levy, Avram Golbert

    Abstract: Vision Transformer models process input images by dividing them into a spatially regular grid of equal-size patches. Conversely, Transformers were originally introduced over natural language sequences, where each token represents a subword - a chunk of raw data of arbitrary size. In this work, we apply this approach to Vision Transformers by introducing a novel image tokenization scheme, replacing… ▽ More

    Submitted 27 April, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

  11. arXiv:2303.01464  [pdf, ps, other

    cs.LG

    Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

    Authors: Orin Levy, Alon Cohen, Asaf Cassel, Yishay Mansour

    Abstract: We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an… ▽ More

    Submitted 14 August, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

  12. arXiv:2303.01000  [pdf, other

    cs.CV cs.AI

    X&Fuse: Fusing Visual Information in Text-to-Image Generation

    Authors: Yuval Kirstain, Omer Levy, Adam Polyak

    Abstract: We introduce X&Fuse, a general approach for conditioning on visual information when generating images from text. We demonstrate the potential of X&Fuse in three different text-to-image generation scenarios. (i) When a bank of images is available, we retrieve and condition on a related image (Retrieve&Fuse), resulting in significant improvements on the MS-COCO benchmark, gaining a state-of-the-art… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  13. arXiv:2301.03728  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Laws for Generative Mixed-Modal Language Models

    Authors: Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, Susan Zhang, Stephen Roller, Naman Goyal, Omer Levy, Luke Zettlemoyer

    Abstract: Generative language models define distributions over sequences of tokens that can represent essentially any combination of data modalities (e.g., any permutation of image tokens from VQ-VAEs, speech tokens from HuBERT, BPE tokens for language or code, and so on). To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modaliti… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  14. arXiv:2212.09689  [pdf, other

    cs.CL cs.AI cs.LG

    Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

    Authors: Or Honovich, Thomas Scialom, Omer Levy, Timo Schick

    Abstract: Instruction tuning enables pretrained language models to perform new tasks from inference-time natural language descriptions. These approaches rely on vast amounts of human supervision in the form of crowdsourced datasets or user interactions. In this work, we introduce Unnatural Instructions: a large dataset of creative and diverse instructions, collected with virtually no human labor. We collect… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: 18 pages, 7 figures

  15. arXiv:2212.08926  [pdf, other

    cs.CL cs.AI cs.LG

    A Simple Baseline for Beam Search Reranking

    Authors: Lior Vassertail, Omer Levy

    Abstract: Reranking methods in machine translation aim to close the gap between common evaluation metrics (e.g. BLEU) and maximum likelihood learning and decoding algorithms. Prior works address this challenge by training models to rerank beam search candidates according to their predicted BLEU scores, building upon large models pretrained on massive monolingual corpora -- a privilege that was never made av… ▽ More

    Submitted 17 December, 2022; originally announced December 2022.

  16. arXiv:2212.07530  [pdf, other

    cs.CL cs.AI cs.LG

    Causes and Cures for Interference in Multilingual Translation

    Authors: Uri Shaham, Maha Elbayad, Vedanuj Goswami, Omer Levy, Shruti Bhosale

    Abstract: Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation… ▽ More

    Submitted 19 May, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  17. arXiv:2211.14932  [pdf, ps, other

    cs.LG

    Eluder-based Regret for Stochastic Contextual MDPs

    Authors: Orin Levy, Asaf Cassel, Alon Cohen, Yishay Mansour

    Abstract: We present the E-UC$^3$RL algorithm for regret minimization in Stochastic Contextual Markov Decision Processes (CMDPs). The algorithm operates under the minimal assumptions of realizable function class and access to \emph{offline} least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 November, 2022; originally announced November 2022.

  18. arXiv:2211.02069  [pdf, other

    cs.CL cs.AI cs.LG

    LMentry: A Language Model Benchmark of Elementary Language Tasks

    Authors: Avia Efrat, Or Honovich, Omer Levy

    Abstract: As the performance of large language models rapidly improves, benchmarks are getting larger and more complex as well. We present LMentry, a benchmark that avoids this "arms race" by focusing on a compact set of tasks that are trivial to humans, e.g. writing a sentence containing a specific word, identifying which words in a list belong to a specific category, or choosing which of two words is long… ▽ More

    Submitted 19 December, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: minor results updates

  19. arXiv:2207.11126  [pdf, other

    cs.LG

    Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP

    Authors: Orin Levy, Yishay Mansour

    Abstract: We present regret minimization algorithms for stochastic contextual MDPs under minimum reachability assumption, using an access to an offline least square regression oracle. We analyze three different settings: where the dynamics is known, where the dynamics is unknown but independent of the context and the most challenging setting where the dynamics is unknown and context-dependent. For the latte… ▽ More

    Submitted 22 January, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

  20. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  21. arXiv:2205.10782  [pdf, other

    cs.CL

    Instruction Induction: From Few Examples to Natural Language Task Descriptions

    Authors: Or Honovich, Uri Shaham, Samuel R. Bowman, Omer Levy

    Abstract: Large language models are able to perform a task by conditioning on a few input-output demonstrations - a paradigm known as in-context learning. We show that language models can explicitly infer an underlying task from a few demonstrations by prompting them to generate a natural language instruction that fits the examples. To explore this ability, we introduce the instruction induction challenge,… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

  22. arXiv:2204.04748  [pdf, other

    cs.CL

    Breaking Character: Are Subwords Good Enough for MRLs After All?

    Authors: Omri Keren, Tal Avinari, Reut Tsarfaty, Omer Levy

    Abstract: Large pretrained language models (PLMs) typically tokenize the input string into contiguous subwords before any pretraining or inference. However, previous studies have claimed that this form of subword tokenization is inadequate for processing morphologically-rich languages (MRLs). We revisit this hypothesis by pretraining a BERT-style masked language model over character sequences instead of wor… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

  23. arXiv:2203.16634  [pdf, other

    cs.CL cs.AI cs.LG

    Transformer Language Models without Positional Encodings Still Learn Positional Information

    Authors: Adi Haviv, Ori Ram, Ofir Press, Peter Izsak, Omer Levy

    Abstract: Causal transformer language models (LMs), such as GPT-3, typically require some form of positional encoding, such as positional embeddings. However, we show that LMs without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire… ▽ More

    Submitted 5 December, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Findings of EMNLP 2022

  24. arXiv:2203.00995  [pdf, ps, other

    cs.LG

    Learning Efficiently Function Approximation for Contextual MDP

    Authors: Orin Levy, Yishay Mansour

    Abstract: We study learning contextual MDPs using a function approximation for both the rewards and the dynamics. We consider both the case that the dynamics dependent or independent of the context. For both models we derive polynomial sample and time complexity (assuming an efficient ERM oracle). Our methodology gives a general reduction from learning contextual MDP to supervised learning.

    Submitted 30 November, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

  25. arXiv:2202.00206  [pdf

    cs.HC eess.SP q-bio.QM stat.AP

    A pilot study of the Earable device to measure facial muscle and eye movement tasks among healthy volunteers

    Authors: Matthew F. Wipperman, Galen Pogoncheff, Katrina F. Mateo, Xuefang Wu, Yiziying Chen, Oren Levy, Andreja Avbersek, Robin R. Deterding, Sara C. Hamon, Tam Vu, Rinol Alaj, Olivier Harari

    Abstract: Many neuromuscular disorders impair function of cranial nerve enervated muscles. Clinical assessment of cranial muscle function has several limitations. Clinician rating of symptoms suffers from inter-rater variation, qualitative or semi-quantitative scoring, and limited ability to capture infrequent or fluctuating symptoms. Patient-reported outcomes are limited by recall bias and poor precision.… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  26. arXiv:2201.13072  [pdf, other

    cs.CL cs.LG

    Are Mutually Intelligible Languages Easier to Translate?

    Authors: Avital Friedland, Jonathan Zeltser, Omer Levy

    Abstract: Two languages are considered mutually intelligible if their native speakers can communicate with each other, while using their own mother tongue. How does the fact that humans perceive a language pair as mutually intelligible affect the ability to learn a translation model between them? We hypothesize that the amount of data needed to train a neural ma-chine translation model is anti-proportional… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

  27. arXiv:2201.03533  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    SCROLLS: Standardized CompaRison Over Long Language Sequences

    Authors: Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy

    Abstract: NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild. We introduce SCROLLS, a suite of tasks that require reasoning over long texts. We examine existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing infor… ▽ More

    Submitted 11 October, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: EMNLP 2022

  28. arXiv:2112.07708  [pdf, other

    cs.CL cs.IR

    Learning to Retrieve Passages without Supervision

    Authors: Ori Ram, Gal Shachaf, Omer Levy, Jonathan Berant, Amir Globerson

    Abstract: Dense retrievers for open-domain question answering (ODQA) have been shown to achieve impressive performance by training on large datasets of question-passage pairs. In this work we ask whether this dependence on labeled data can be reduced via unsupervised pretraining that is geared towards ODQA. We show this is in fact possible, via a novel pretraining scheme designed for retrieval. Our "recurri… ▽ More

    Submitted 17 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: NAACL 2022

  29. arXiv:2112.07210  [pdf, other

    cs.CL

    Simple Local Attentions Remain Competitive for Long-Context Tasks

    Authors: Wenhan Xiong, Barlas Oğuz, Anchit Gupta, Xilun Chen, Diana Liskovich, Omer Levy, Wen-tau Yih, Yashar Mehdad

    Abstract: Many NLP tasks require processing long contexts beyond the length limit of pretrained models. In order to scale these models to longer text sequences, many efficient long-range attention variants have been proposed. Despite the abundance of research along this direction, it is still difficult to gauge the relative effectiveness of these models in practical use cases, e.g., if we apply these models… ▽ More

    Submitted 3 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: NAACL 2022 Main Conference

  30. arXiv:2110.04374  [pdf, other

    cs.CL

    A Few More Examples May Be Worth Billions of Parameters

    Authors: Yuval Kirstain, Patrick Lewis, Sebastian Riedel, Omer Levy

    Abstract: We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  31. arXiv:2109.11314  [pdf, other

    cs.CL

    ParaShoot: A Hebrew Question Answering Dataset

    Authors: Omri Keren, Omer Levy

    Abstract: NLP research in Hebrew has largely focused on morphology and syntax, where rich annotated datasets in the spirit of Universal Dependencies are available. Semantic datasets, however, are in short supply, hindering crucial advances in the development of NLP technology in Hebrew. In this work, we present ParaShoot, the first question answering dataset in modern Hebrew. The dataset follows the format… ▽ More

    Submitted 23 September, 2021; originally announced September 2021.

  32. arXiv:2108.11193  [pdf, other

    cs.CL cs.AI cs.LG

    Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

    Authors: Itay Itzhak, Omer Levy

    Abstract: Standard pretrained language models operate on sequences of subword tokens without direct access to the characters that compose each token's string representation. We probe the embedding layer of pretrained language models and show that models learn the internal character composition of whole word and subword tokens to a surprising extent, without ever seeing the characters coupled with the tokens… ▽ More

    Submitted 8 June, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: NAACL 2022

  33. arXiv:2108.05857  [pdf, other

    cs.CL

    How Optimal is Greedy Decoding for Extractive Question Answering?

    Authors: Or Castel, Ori Ram, Avia Efrat, Omer Levy

    Abstract: Fine-tuned language models use greedy decoding to answer reading comprehension questions with relative success. However, this approach does not ensure that the answer is a span in the given passage, nor does it guarantee that it is the most probable one. Does greedy decoding actually perform worse than an algorithm that does adhere to these properties? To study the performance and optimality of gr… ▽ More

    Submitted 8 November, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: AKBC 2022 12 pages, 3 figures

  34. arXiv:2107.09729  [pdf, other

    cs.CL cs.AI cs.LG

    What Do You Get When You Cross Beam Search with Nucleus Sampling?

    Authors: Uri Shaham, Omer Levy

    Abstract: We combine beam search with the probabilistic pruning technique of nucleus sampling to create two deterministic nucleus search algorithms for natural language generation. The first algorithm, p-exact search, locally prunes the next-token distribution and performs an exact search over the remaining space. The second algorithm, dynamic beam search, shrinks and expands the beam size according to the… ▽ More

    Submitted 2 May, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: The Third Workshop on Insights from Negative Results in NLP

  35. arXiv:2104.09554  [pdf, other

    cs.CL cs.AI cs.LG

    Can Latent Alignments Improve Autoregressive Machine Translation?

    Authors: Adi Haviv, Lior Vassertail, Omer Levy

    Abstract: Latent alignment objectives such as CTC and AXE significantly improve non-autoregressive machine translation models. Can they improve autoregressive models as well? We explore the possibility of training autoregressive machine translation models with latent alignment objectives, and observe that, in practice, this approach results in degenerate models. We provide a theoretical explanation for thes… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: Accepted to NAACL 2021

  36. arXiv:2104.07705  [pdf, other

    cs.CL cs.AI cs.LG

    How to Train BERT with an Academic Budget

    Authors: Peter Izsak, Moshe Berchansky, Omer Levy

    Abstract: While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizati… ▽ More

    Submitted 9 September, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

  37. arXiv:2103.01242  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language

    Authors: Avia Efrat, Uri Shaham, Dan Kilman, Omer Levy

    Abstract: Current NLP datasets targeting ambiguity can be solved by a native speaker with relative ease. We present Cryptonite, a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced. Each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and… ▽ More

    Submitted 1 November, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: EMNLP 2021

  38. arXiv:2101.00438  [pdf, other

    cs.CL

    Few-Shot Question Answering by Pretraining Span Selection

    Authors: Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy

    Abstract: In several question answering benchmarks, pretrained models have reached human parity through fine-tuning on an order of 100,000 annotated questions and answers. We explore the more realistic few-shot setting, where only a few hundred training examples are available, and observe that standard models perform poorly, highlighting the discrepancy between current pretraining objectives and question an… ▽ More

    Submitted 2 June, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: Accepted to ACL 2021

  39. arXiv:2101.00434  [pdf, other

    cs.CL

    Coreference Resolution without Span Representations

    Authors: Yuval Kirstain, Ori Ram, Omer Levy

    Abstract: The introduction of pretrained language models has reduced many complex task-specific NLP models to simple lightweight layers. An exception to this trend is coreference resolution, where a sophisticated task-specific model is appended to a pretrained transformer encoder. While highly effective, the model has a very large memory footprint -- primarily due to dynamically-constructed span and span-pa… ▽ More

    Submitted 31 May, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: Accepted to ACL 2021

  40. arXiv:2012.14913  [pdf, other

    cs.CL

    Transformer Feed-Forward Layers Are Key-Value Memories

    Authors: Mor Geva, Roei Schuster, Jonathan Berant, Omer Levy

    Abstract: Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary. Our experiments show that… ▽ More

    Submitted 5 September, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: EMNLP 2021

  41. arXiv:2010.11982  [pdf, other

    cs.CL cs.AI cs.LG

    The Turking Test: Can Language Models Understand Instructions?

    Authors: Avia Efrat, Omer Levy

    Abstract: Supervised machine learning provides the learner with a set of input-output examples of the target task. Humans, however, can also learn to perform new tasks from instructions in natural language. Can machines learn to understand instructions as well? We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity. These range from simple… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  42. arXiv:2008.09396  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Machine Translation without Embeddings

    Authors: Uri Shaham, Omer Levy

    Abstract: Many NLP models operate over sequences of subword tokens produced by hand-crafted tokenization rules and heuristic subword induction algorithms. A simple universal alternative is to represent every computerized text as a sequence of bytes via UTF-8, obviating the need for an embedding layer since there are fewer token types (256) than dimensions. Surprisingly, replacing the ubiquitous embedding la… ▽ More

    Submitted 12 April, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

    Comments: NAACL 2021

  43. arXiv:2004.01655  [pdf, other

    cs.CL cs.LG stat.ML

    Aligned Cross Entropy for Non-Autoregressive Machine Translation

    Authors: Marjan Ghazvininejad, Vladimir Karpukhin, Luke Zettlemoyer, Omer Levy

    Abstract: Non-autoregressive machine translation models significantly speed up decoding by allowing for parallel prediction of the entire target sequence. However, modeling word order is more challenging due to the lack of autoregressive factors in the model. This difficultly is compounded during training with cross entropy loss, which can highly penalize small shifts in word order. In this paper, we propos… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

  44. arXiv:2001.08785  [pdf, other

    cs.CL cs.LG stat.ML

    Semi-Autoregressive Training Improves Mask-Predict Decoding

    Authors: Marjan Ghazvininejad, Omer Levy, Luke Zettlemoyer

    Abstract: The recently proposed mask-predict decoding algorithm has narrowed the performance gap between semi-autoregressive machine translation models and the traditional left-to-right approach. We introduce a new training method for conditional masked language models, SMART, which mimics the semi-autoregressive behavior of mask-predict, producing training examples that contain model predictions as part of… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.

  45. arXiv:1911.03864  [pdf, other

    cs.CL cs.LG

    Improving Transformer Models by Reordering their Sublayers

    Authors: Ofir Press, Noah A. Smith, Omer Levy

    Abstract: Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers. Could ordering the sublayers in a different pattern lead to better performance? We generate randomly ordered transformers and train them with the language modeling objective. We observe that some of these models are able to achieve better performance than the interleaved baseline, and that those succes… ▽ More

    Submitted 23 April, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: To appear at ACL 2020

  46. arXiv:1911.02972  [pdf, other

    cs.CL cs.LG

    Blockwise Self-Attention for Long Document Understanding

    Authors: Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, Jie Tang

    Abstract: We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies. Our model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and training/inference time, which also enables attention heads to capture either short- or long-range contextual information. We conduct experiments on language model p… ▽ More

    Submitted 1 November, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

    Comments: Accepted at Findings of EMNLP'20 and SustaiNLP 2020 at EMNLP'20, 12 pages

  47. arXiv:1911.00172  [pdf, other

    cs.CL

    Generalization through Memorization: Nearest Neighbor Language Models

    Authors: Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, Mike Lewis

    Abstract: We introduce $k$NN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a $k$-nearest neighbors ($k$NN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding space, and can be drawn from any text collection, including the original LM training data. Applying this augmentation to a strong Wikitext-103 LM, with neighbor… ▽ More

    Submitted 14 February, 2020; v1 submitted 31 October, 2019; originally announced November 2019.

    Comments: ICLR 2020

  48. arXiv:1910.13461  [pdf, other

    cs.CL cs.LG stat.ML

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Authors: Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer

    Abstract: We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT… ▽ More

    Submitted 29 October, 2019; originally announced October 2019.

  49. arXiv:1910.00577  [pdf, other

    cs.LG cs.PL stat.ML

    Structural Language Models of Code

    Authors: Uri Alon, Roy Sadaka, Omer Levy, Eran Yahav

    Abstract: We address the problem of any-code completion - generating a missing piece of source code in a given program without any restriction on the vocabulary or structure. We introduce a new approach to any-code completion that leverages the strict syntax of programming languages to model a code snippet as a tree - structural language modeling (SLM). SLM estimates the probability of the program's abstrac… ▽ More

    Submitted 29 July, 2020; v1 submitted 30 September, 2019; originally announced October 2019.

    Comments: Appeared in ICML'2020

  50. arXiv:1908.09091  [pdf, ps, other

    cs.CL

    BERT for Coreference Resolution: Baselines and Analysis

    Authors: Mandar Joshi, Omer Levy, Daniel S. Weld, Luke Zettlemoyer

    Abstract: We apply BERT to coreference resolution, achieving strong improvements on the OntoNotes (+3.9 F1) and GAP (+11.5 F1) benchmarks. A qualitative analysis of model predictions indicates that, compared to ELMo and BERT-base, BERT-large is particularly better at distinguishing between related but distinct entities (e.g., President and CEO). However, there is still room for improvement in modeling docum… ▽ More

    Submitted 22 December, 2019; v1 submitted 24 August, 2019; originally announced August 2019.

    Comments: Fix test set numbers for e2e-coref on GAP