Skip to main content

Showing 1–10 of 10 results for author: Garriga-Alonso, A

  1. arXiv:2407.12404  [pdf, other

    cs.LG

    Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024

    Authors: Daniel Tan, David Chanin, Aengus Lynch, Dimitrios Kanoulas, Brooks Paige, Adria Garriga-Alonso, Robert Kirk

    Abstract: Steering vectors (SVs) are a new approach to efficiently adjust language model behaviour at inference time by intervening on intermediate model activations. They have shown promise in terms of improving both capabilities and model alignment. However, the reliability and generalisation properties of this approach are unknown. In this work, we rigorously investigate these properties, and show that s… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2304.14997  [pdf, other

    cs.LG

    Towards Automated Circuit Discovery for Mechanistic Interpretability

    Authors: Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, Adrià Garriga-Alonso

    Abstract: Through considerable effort and intuition, several recent works have reverse-engineered nontrivial behaviors of transformer models. This paper systematizes the mechanistic interpretability process they followed. First, researchers choose a metric and dataset that elicit the desired model behavior. Then, they apply activation patching to find which abstract neural network units are involved in the… ▽ More

    Submitted 28 October, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023 Spotlight

  3. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  4. arXiv:2106.05586  [pdf, other

    stat.ML cs.LG

    Data augmentation in Bayesian neural networks and the cold posterior effect

    Authors: Seth Nabarro, Stoil Ganev, Adrià Garriga-Alonso, Vincent Fortuin, Mark van der Wilk, Laurence Aitchison

    Abstract: Bayesian neural networks that incorporate data augmentation implicitly use a ``randomly perturbed log-likelihood [which] does not have a clean interpretation as a valid likelihood function'' (Izmailov et al. 2021). Here, we provide several approaches to developing principled Bayesian neural networks incorporating data augmentation. We introduce a ``finite orbit'' setting which allows likelihoods t… ▽ More

    Submitted 9 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  5. BNNpriors: A library for Bayesian neural network inference with different prior distributions

    Authors: Vincent Fortuin, Adrià Garriga-Alonso, Mark van der Wilk, Laurence Aitchison

    Abstract: Bayesian neural networks have shown great promise in many applications where calibrated uncertainty estimates are crucial and can often also lead to a higher predictive performance. However, it remains challenging to choose a good prior distribution over their weights. While isotropic Gaussian priors are often chosen in practice due to their simplicity, they do not reflect our true prior beliefs w… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at Software Impacts

  6. arXiv:2102.06571  [pdf, other

    stat.ML cs.LG

    Bayesian Neural Network Priors Revisited

    Authors: Vincent Fortuin, Adrià Garriga-Alonso, Sebastian W. Ober, Florian Wenzel, Gunnar Rätsch, Richard E. Turner, Mark van der Wilk, Laurence Aitchison

    Abstract: Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, it is unclear whether these priors accurately reflect our true beliefs about the weight distributions or give optimal performance. To find better priors, we study summary statistics of neural network weights in networks trained using stochastic gradient descent (SGD). We find that convolution… ▽ More

    Submitted 16 March, 2022; v1 submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted at ICLR 2022

  7. arXiv:2102.01691  [pdf, ps, other

    stat.ML cs.LG

    Exact Langevin Dynamics with Stochastic Gradients

    Authors: Adrià Garriga-Alonso, Vincent Fortuin

    Abstract: Stochastic gradient Markov Chain Monte Carlo algorithms are popular samplers for approximate inference, but they are generally biased. We show that many recent versions of these methods (e.g. Chen et al. (2014)) cannot be corrected using Metropolis-Hastings rejection sampling, because their acceptance probability is always zero. We can fix this by employing a sampler with realizable backwards traj… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: 13 pages, 2 figures. Accepted to the 3rd Symposium on Advances in Approximate Bayesian Inference (AABI 2021)

  8. arXiv:2101.04097  [pdf, other

    stat.ML cs.LG

    Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks

    Authors: Adrià Garriga-Alonso, Mark van der Wilk

    Abstract: Infinite width limits of deep neural networks often have tractable forms. They have been used to analyse the behaviour of finite networks, as well as being useful methods in their own right. When investigating infinitely wide convolutional neural networks (CNNs), it was observed that the correlations arising from spatial weight sharing disappear in the infinite limit. This is undesirable, as spati… ▽ More

    Submitted 13 June, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021)

  9. arXiv:2011.09421  [pdf, other

    stat.ML cs.LG

    Understanding Variational Inference in Function-Space

    Authors: David R. Burt, Sebastian W. Ober, Adrià Garriga-Alonso, Mark van der Wilk

    Abstract: Recent work has attempted to directly approximate the `function-space' or predictive posterior distribution of Bayesian models, without approximating the posterior distribution over the parameters. This is appealing in e.g. Bayesian neural networks, where we only need the former, and the latter is hard to represent. In this work, we highlight some advantages and limitations of employing the Kullba… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

    Comments: 19 pages

  10. arXiv:1808.05587  [pdf, other

    stat.ML cs.LG

    Deep Convolutional Networks as shallow Gaussian Processes

    Authors: Adrià Garriga-Alonso, Carl Edward Rasmussen, Laurence Aitchison

    Abstract: We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the o… ▽ More

    Submitted 4 May, 2019; v1 submitted 16 August, 2018; originally announced August 2018.