Skip to main content

Showing 1–6 of 6 results for author: Caballero, E

  1. arXiv:2210.14891  [pdf, other

    cs.LG cs.AI

    Broken Neural Scaling Laws

    Authors: Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger

    Abstract: We present a smoothly broken power law functional form (that we refer to as a Broken Neural Scaling Law (BNSL)) that accurately models & extrapolates the scaling behaviors of deep neural networks (i.e. how the evaluation metric of interest varies as amount of compute used for training (or inference), number of model parameters, training dataset size, model input size, number of training steps, or… ▽ More

    Submitted 23 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2023

    Journal ref: International Conference on Learning Representations (ICLR), 2023

  2. arXiv:2110.06990  [pdf, other

    cs.LG cs.AI cs.CV

    Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers

    Authors: Gabriele Prato, Simon Guiroy, Ethan Caballero, Irina Rish, Sarath Chandar

    Abstract: Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive e… ▽ More

    Submitted 18 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

  3. arXiv:2106.06607  [pdf, other

    cs.LG stat.ML

    Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

    Authors: Kartik Ahuja, Ethan Caballero, Dinghuai Zhang, Jean-Christophe Gagnon-Audet, Yoshua Bengio, Ioannis Mitliagkas, Irina Rish

    Abstract: The invariance principle from causality is at the heart of notable approaches such as invariant risk minimization (IRM) that seek to address out-of-distribution (OOD) generalization failures. Despite the promising theory, invariance principle-based approaches fail in common classification tasks, where invariant (causal) features capture all the information about the label. Are these failures due t… ▽ More

    Submitted 20 November, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

  4. arXiv:2010.11924  [pdf, other

    cs.LG stat.ML

    In Search of Robust Measures of Generalization

    Authors: Gintare Karolina Dziugaite, Alexandre Drouin, Brady Neal, Nitarshan Rajkumar, Ethan Caballero, Linbo Wang, Ioannis Mitliagkas, Daniel M. Roy

    Abstract: One of the principal scientific challenges in deep learning is explaining generalization, i.e., why the particular way the community now trains networks to achieve small training error also leads to small error on held-out data from the same population. It is widely appreciated that some worst-case theories -- such as those based on the VC dimension of the class of predictors induced by modern neu… ▽ More

    Submitted 20 January, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 27 pages, 11 figures, 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  5. arXiv:2003.00688  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Out-of-Distribution Generalization via Risk Extrapolation (REx)

    Authors: David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, Aaron Courville

    Abstract: Distributional shift is one of the major obstacles when transferring machine learning prediction systems from the lab to the real world. To tackle this problem, we assume that variation across training domains is representative of the variation we might encounter at test time, but also that shifts at test time may be more extreme in magnitude. In particular, we show that reducing differences in ri… ▽ More

    Submitted 25 February, 2021; v1 submitted 2 March, 2020; originally announced March 2020.

  6. arXiv:1511.06420   

    cs.NE cs.CL cs.LG

    Skip-Thought Memory Networks

    Authors: Ethan Caballero

    Abstract: Question Answering (QA) is fundamental to natural language processing in that most nlp problems can be phrased as QA (Kumar et al., 2015). Current weakly supervised memory network models that have been proposed so far struggle at answering questions that involve relations among multiple entities (such as facebook's bAbi qa5-three-arg-relations in (Weston et al., 2015)). To address this problem of… ▽ More

    Submitted 23 November, 2015; v1 submitted 19 November, 2015; originally announced November 2015.

    Comments: Removed by arXiv administrators because submission violated the terms of arXiv's license agreement