Skip to main content

Showing 1–50 of 243 results for author: Ermon, S

  1. arXiv:2407.09739  [pdf, other

    cs.LG cs.AI stat.ML

    Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes

    Authors: Syrine Belakaria, Benjamin Letham, Janardhan Rao Doppa, Barbara Engelhardt, Stefano Ermon, Eytan Bakshy

    Abstract: We consider the problem of active learning for global sensitivity analysis of expensive black-box functions. Our aim is to efficiently learn the importance of different input variables, e.g., in vehicle safety experimentation, we study the impact of the thickness of various components on safety objectives. Since function evaluations are expensive, we use active learning to prioritize experimental… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  2. arXiv:2407.09578  [pdf, other

    cs.CV cs.LG

    Unsupervised Anomaly Detection Using Diffusion Trend Analysis

    Authors: Eunwoo Kim, Un Yang, Cheol Lae Roh, Stefano Ermon

    Abstract: Conventional anomaly detection techniques based on reconstruction via denoising diffusion model are widely used due to their ability to identify anomaly locations and shapes with high performance. However, there is a limitation in determining appropriate noise parameters that can degrade anomalies while preserving normal characteristics. Also, due to the volatility of the diffusion model, normal r… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 4 pages, 4 figures, 1 table

    MSC Class: 68T45 (Primary) 68T27 (Secondary) ACM Class: I.2.10

  3. arXiv:2407.02398  [pdf, other

    cs.CV

    Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

    Authors: Ling Yang, Zixiang Zhang, Zhilong Zhang, Xingchao Liu, Minkai Xu, Wentao Zhang, Chenlin Meng, Stefano Ermon, Bin Cui

    Abstract: Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/YangLing0818/consistency_flow_matching

  4. arXiv:2407.01648  [pdf, other

    q-bio.BM cs.LG q-bio.QM

    Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization

    Authors: Siyi Gu, Minkai Xu, Alexander Powers, Weili Nie, Tomas Geffner, Karsten Kreis, Jure Leskovec, Arash Vahdat, Stefano Ermon

    Abstract: Generating ligand molecules for specific protein targets, known as structure-based drug design, is a fundamental problem in therapeutics development and biological discovery. Recently, target-aware generative models, especially diffusion models, have shown great promise in modeling protein-ligand interactions and generating candidate drugs. However, existing models primarily focus on learning the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  5. arXiv:2406.17998  [pdf, other

    cs.CV

    Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model

    Authors: Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong

    Abstract: Our understanding of the temporal dynamics of the Earth's surface has been advanced by deep vision models, which often require lots of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present change data generators based on gene… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: The enhanced extension of our ICCV 2023 (Changen)

  6. arXiv:2406.15658  [pdf, other

    cs.CV cs.AI

    TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning

    Authors: Nemin Wu, Qian Cao, Zhangyu Wang, Zeping Liu, Yanlin Qi, Jielu Zhang, Joshua Ni, Xiaobai Yao, Hongxu Ma, Lan Mu, Stefano Ermon, Tanuja Ganu, Akshay Nambi, Ni Lao, Gengchen Mai

    Abstract: Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generati… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures. Submitted to NeurIPS 2024 Datasets and Benchmarks Track. Under review

  7. arXiv:2406.10973  [pdf, other

    cs.CV cs.AI

    ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts

    Authors: Samar Khanna, Medhanie Irgau, David B. Lobell, Stefano Ermon

    Abstract: Parameter-efficient fine-tuning (PEFT) techniques such as low-rank adaptation (LoRA) can effectively adapt large pre-trained foundation models to downstream tasks using only a small fraction (0.1%-10%) of the original trainable weights. An under-explored question of PEFT is in extending the pre-training phase without supervised labels; that is, can we adapt a pre-trained foundation model to a new… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  8. arXiv:2405.14822  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

    Authors: Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon

    Abstract: To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a technique to progressively grow the resolution of the generator beyo… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  9. arXiv:2405.06147  [pdf, other

    cs.LG eess.SY

    State-Free Inference of State-Space Models: The Transfer Function Approach

    Authors: Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli

    Abstract: We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of… ▽ More

    Submitted 1 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Resubmission 02/06/2024: Fixed minor typo of recurrent form RTF

  10. arXiv:2404.14367  [pdf, other

    cs.LG

    Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

    Authors: Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar

    Abstract: Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning. Different methods come with different implementation tradeoffs and performance differences, and existing empirical findings present different concl… ▽ More

    Submitted 2 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: International Conference on Machine Learning (ICML), 2024

  11. arXiv:2404.02883  [pdf, other

    cs.CV cs.AI cs.LG

    On the Scalability of Diffusion-based Text-to-Image Generation

    Authors: Hao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R. Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto

    Abstract: Scaling up model and data size has been quite successful for the evolution of LLMs. However, the scaling law for the diffusion based text-to-image (T2I) models is not fully explored. It is also unclear how to efficiently scale the model for better performance at reduced cost. The different training settings and expensive training cost make a fair model comparison extremely difficult. In this work,… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  12. arXiv:2403.19159  [pdf, other

    cs.CL cs.LG

    Disentangling Length from Quality in Direct Preference Optimization

    Authors: Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been a crucial component in the recent success of Large Language Models. However, RLHF is know to exploit biases in human preferences, such as verbosity. A well-formatted and eloquent answer is often more highly rated by users, even when it is less helpful and objective. A number of approaches have been developed to control those biases in the… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  13. arXiv:2403.17844  [pdf, other

    cs.LG

    Mechanistic Design and Scaling of Hybrid Architectures

    Authors: Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

    Abstract: The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation. We set out to simplify this process by grounding it in an end-to-end mechanistic architecture design (MAD) pipeline, encompassing small-scale capability unit tests predictive of scaling law… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  14. arXiv:2402.16627  [pdf, other

    cs.CV cs.AI cs.LG

    Contextualized Diffusion Models for Text-Guided Image and Video Generation

    Authors: Ling Yang, Zhilong Zhang, Zhaochen Yu, Jingwei Liu, Minkai Xu, Stefano Ermon, Bin Cui

    Abstract: Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual relationships exclusively into the reverse process, often disregarding their relevance in the forward process. This inconsistency between forward and reverse processes m… ▽ More

    Submitted 3 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: ICLR 2024. Project: https://github.com/YangLing0818/ContextDiff

  15. arXiv:2402.08383  [pdf, other

    cs.LG cs.AI

    Uncertainty Quantification for Forward and Inverse Problems of PDEs via Latent Global Evolution

    Authors: Tailin Wu, Willie Neiswanger, Hongtao Zheng, Stefano Ermon, Jure Leskovec

    Abstract: Deep learning-based surrogate models have demonstrated remarkable advantages over classical solvers in terms of speed, often achieving speedups of 10 to 1000 times over traditional partial differential equation (PDE) solvers. However, a significant challenge hindering their widespread adoption in both scientific and industrial domains is the lack of understanding about their prediction uncertainti… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI 2024 (Oral)

  16. arXiv:2402.02680  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Large Language Models are Geographically Biased

    Authors: Rohin Manvi, Samar Khanna, Marshall Burke, David Lobell, Stefano Ermon

    Abstract: Large Language Models (LLMs) inherently carry the biases contained in their training corpora, which can lead to the perpetuation of societal harm. As the impact of these foundation models grows, understanding and evaluating their biases becomes crucial to achieving fairness and accuracy. We propose to study what LLMs know about the world we live in through the lens of geography. This approach is p… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  17. arXiv:2402.01188  [pdf, other

    cs.CV

    Segment Any Change

    Authors: Zhuo Zheng, Yanfei Zhong, Liangpei Zhang, Stefano Ermon

    Abstract: Visual foundation models have achieved remarkable results in zero-shot image classification and segmentation, but zero-shot change detection remains an open problem. In this paper, we propose the segment any change models (AnyChange), a new type of change detection model that supports zero-shot prediction and generalization on unseen change types and data distributions. AnyChange is built on the s… ▽ More

    Submitted 14 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: technical report, 12 pages

  18. arXiv:2401.11708  [pdf, other

    cs.CV cs.AI cs.LG

    Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

    Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui

    Abstract: Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and relationships. In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessin… ▽ More

    Submitted 5 May, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: ICML 2024. Project: https://github.com/YangLing0818/RPG-DiffusionMaster

  19. arXiv:2401.11037  [pdf, other

    cs.LG math.NA q-bio.QM

    Equivariant Graph Neural Operator for Modeling 3D Dynamics

    Authors: Minkai Xu, Jiaqi Han, Aaron Lou, Jean Kossaifi, Arvind Ramanathan, Kamyar Azizzadenesheli, Jure Leskovec, Stefano Ermon, Anima Anandkumar

    Abstract: Modeling the complex three-dimensional (3D) dynamics of relational systems is an important problem in the natural sciences, with applications ranging from molecular simulations to particle mechanics. Machine learning methods have achieved good success by learning graph neural networks to model spatial interactions. However, these approaches do not faithfully capture temporal correlations since the… ▽ More

    Submitted 2 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s)

  20. arXiv:2312.07168  [pdf, other

    cs.LG cs.AI

    Equivariant Flow Matching with Hybrid Probability Transport

    Authors: Yuxuan Song, Jingjing Gong, Minkai Xu, Ziyao Cao, Yanyan Lan, Stefano Ermon, Hao Zhou, Wei-Ying Ma

    Abstract: The generation of 3D molecules requires simultaneously deciding the categorical features~(atom types) and continuous features~(atom coordinates). Deep generative models, especially Diffusion Models (DMs), have demonstrated effectiveness in generating feature-rich geometries. However, existing DMs typically suffer from unstable probability dynamics with inefficient sampling speed. In this paper, we… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023

  21. arXiv:2312.03606  [pdf, other

    cs.CV cs.AI cs.LG

    DiffusionSat: A Generative Foundation Model for Satellite Imagery

    Authors: Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach, Marshall Burke, David Lobell, Stefano Ermon

    Abstract: Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications including environmental monitoring and crop-yield prediction. Satellite images are significantly different from natural images -- they can be multi-spectral, irregular… ▽ More

    Submitted 25 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Published at ICLR 2024

  22. arXiv:2311.17082  [pdf, other

    cs.CV stat.ML

    DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling

    Authors: Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon

    Abstract: Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped… ▽ More

    Submitted 20 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Github repo: https://github.com/alexzhou907/DreamPropeller; Project page: https://alexzhou907.github.io/dreampropeller_page/

  23. arXiv:2311.16424  [pdf, other

    cs.LG cs.AI cs.CV

    Manifold Preserving Guided Diffusion

    Authors: Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon

    Abstract: Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework that leverages pretrained diffusion models and off-the-shelf neural networks with minimal additional inference cost for a broad… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  24. arXiv:2311.12908  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Diffusion Model Alignment Using Direct Preference Optimization

    Authors: Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik

    Abstract: Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality im… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  25. arXiv:2311.04287  [pdf, other

    cs.CV cs.LG

    Holistic Evaluation of Text-To-Image Models

    Authors: Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

    Abstract: The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023. First three authors contributed equally

  26. arXiv:2310.20211  [pdf, other

    cs.LG stat.ML

    Calibration by Distribution Matching: Trainable Kernel Calibration Metrics

    Authors: Charles Marx, Sofian Zalouk, Stefano Ermon

    Abstract: Calibration ensures that probabilistic forecasts meaningfully capture uncertainty by requiring that predicted probabilities align with empirical frequencies. However, many existing calibration methods are specialized for post-hoc recalibration, which can worsen the sharpness of forecasts. Drawing on the insight that calibration can be viewed as a distribution matching task, we introduce kernel-bas… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  27. arXiv:2310.20030  [pdf, other

    cs.LG math.DG stat.ML

    Scaling Riemannian Diffusion Models

    Authors: Aaron Lou, Minkai Xu, Stefano Ermon

    Abstract: Riemannian diffusion models draw inspiration from standard Euclidean space diffusion models to learn distributions on general manifolds. Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  28. arXiv:2310.18780  [pdf, other

    cs.LG cs.AI eess.SP

    Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

    Authors: Stefano Massaroli, Michael Poli, Daniel Y. Fu, Hermann Kumbong, Rom N. Parnichkun, Aman Timalsina, David W. Romero, Quinn McIntyre, Beidi Chen, Atri Rudra, Ce Zhang, Christopher Re, Stefano Ermon, Yoshua Bengio

    Abstract: Recent advances in attention-free sequence models rely on convolutions as alternatives to the attention operator at the core of Transformers. In particular, long convolution sequence models have achieved state-of-the-art performance in many domains, but incur a significant cost during auto-regressive inference workloads -- naively requiring a full pass (or caching of activations) over the input se… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  29. arXiv:2310.17638  [pdf, other

    cs.LG stat.ML

    Generative Fractional Diffusion Models

    Authors: Gabriel Nobis, Maximilian Springenberg, Marco Aversa, Michael Detzel, Rembert Daems, Roderick Murray-Smith, Shinichi Nakajima, Sebastian Lapuschkin, Stefano Ermon, Tolga Birdal, Manfred Opper, Christoph Knochenhauer, Luis Oala, Wojciech Samek

    Abstract: We introduce the first continuous-time score-based generative model that leverages fractional diffusion processes for its underlying dynamics. Although diffusion models have excelled at capturing data distributions, they still suffer from various limitations such as slow convergence, mode-collapse on imbalanced data, and lack of diversity. These issues are partially linked to the use of light-tail… ▽ More

    Submitted 24 June, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    ACM Class: I.2.4; F.4.1; G.3

  30. arXiv:2310.16834  [pdf, other

    stat.ML cs.CL cs.LG

    Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

    Authors: Aaron Lou, Chenlin Meng, Stefano Ermon

    Abstract: Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing sc… ▽ More

    Submitted 6 June, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: ICML 2024 Oral. Code at https://github.com/louaaron/Score-Entropy-Discrete-Diffusion

  31. arXiv:2310.06213  [pdf, other

    cs.CL cs.LG

    GeoLLM: Extracting Geospatial Knowledge from Large Language Models

    Authors: Rohin Manvi, Samar Khanna, Gengchen Mai, Marshall Burke, David Lobell, Stefano Ermon

    Abstract: The application of machine learning (ML) in a range of geospatial tasks is increasingly common but often relies on globally available covariates such as satellite imagery that can either be expensive or lack predictive power. Here we explore the question of whether the vast amounts of knowledge found in Internet language corpora, now compressed within large language models (LLMs), can be leveraged… ▽ More

    Submitted 24 February, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  32. arXiv:2310.02777  [pdf, other

    cs.CL

    The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

    Authors: Chenwei Wu, Li Erran Li, Stefano Ermon, Patrick Haffner, Rong Ge, Zaiwei Zhang

    Abstract: Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood. In this paper, we identify two sources of visual-linguistic compositionality: linguistic priors and the interplay between images and texts. We show that current attempts to improve compositional generalization rely on li… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  33. arXiv:2310.02279  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion

    Authors: Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon

    Abstract: Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encompassing CM and score-based models as special cases. CTM trains a single neural network that can -- in a single forward pass --… ▽ More

    Submitted 30 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: International Conference on Learning Representations

  34. arXiv:2310.00413  [pdf, other

    cs.CV cs.LG eess.IV

    SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution

    Authors: Gengchen Mai, Ni Lao, Weiwei Sun, Yuchi Ma, Jiaming Song, Chenlin Meng, Hongxu Ma, Jinmeng Rao, Ziyuan Li, Stefano Ermon

    Abstract: Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolu… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    MSC Class: 68T07; 68T45 ACM Class: I.4.10; I.2.10; I.4.6

  35. arXiv:2309.16948  [pdf, other

    cs.CV cs.AI

    Denoising Diffusion Bridge Models

    Authors: Linqi Zhou, Aaron Lou, Samar Khanna, Stefano Ermon

    Abstract: Diffusion models are powerful generative models that map noise to data using stochastic processes. However, for many applications such as image editing, the model input comes from a distribution that is not random noise. As such, diffusion models must rely on cumbersome methods like guidance or projected sampling to incorporate this information in the generative process. In our work, we propose De… ▽ More

    Submitted 5 December, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: Github: https://github.com/alexzhou907/DDBM/

  36. arXiv:2308.12061  [pdf, other

    cs.CV cs.LG

    HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing

    Authors: Jonathan Xu, Amna Elmustafa, Liya Weldegebriel, Emnet Negash, Richard Lee, Chenlin Meng, Stefano Ermon, David Lobell

    Abstract: Small farms contribute to a large share of the productive land in developing countries. In regions such as sub-Saharan Africa, where 80\% of farms are small (under 2 ha in size), the task of mapping smallholder cropland is an important part of tracking sustainability measures such as crop productivity. However, the visually diverse and nuanced appearance of small farms has limited the effectivenes… ▽ More

    Submitted 5 March, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: submitted to AAAI24

  37. arXiv:2307.08423  [pdf, other

    cs.LG physics.comp-ph

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence , et al. (38 additional authors not shown)

    Abstract: Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc… ▽ More

    Submitted 15 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  38. arXiv:2306.17624  [pdf, other

    cs.CV cs.AI cs.LG

    Sphere2Vec: A General-Purpose Location Representation Learning over a Spherical Surface for Large-Scale Geospatial Predictions

    Authors: Gengchen Mai, Yao Xuan, Wenyun Zuo, Yutong He, Jiaming Song, Stefano Ermon, Krzysztof Janowicz, Ni Lao

    Abstract: Generating learning-friendly representations for points in space is a fundamental and long-standing problem in ML. Recently, multi-scale encoding schemes (such as Space2Vec and NeRF) were proposed to directly encode any point in 2D/3D Euclidean space as a high-dimensional vector, and has been successfully applied to various geospatial prediction and generative tasks. However, all current 2D and 3D… ▽ More

    Submitted 2 July, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: 30 Pages, 16 figures. Accepted to ISPRS Journal of Photogrammetry and Remote Sensing

    MSC Class: 68T07; 68T45 ACM Class: I.2.0; I.2.6; I.2.10; I.5.1; J.2

    Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, 2023

  39. arXiv:2306.15794  [pdf, other

    cs.LG q-bio.GN

    HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution

    Authors: Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré

    Abstract: Genomic (DNA) sequences encode an enormous amount of information for gene regulation and protein synthesis. Similar to natural language models, researchers have proposed foundation models in genomics to learn generalizable features from unlabeled genome data that can then be fine-tuned for downstream tasks such as identifying regulatory elements. Due to the quadratic scaling of attention, previous… ▽ More

    Submitted 14 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 (Spotlight)

  40. arXiv:2306.05426  [pdf, other

    cs.LG cs.AI

    SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking

    Authors: Chris Cundy, Stefano Ermon

    Abstract: In many domains, autoregressive models can attain high likelihood on the task of predicting the next observation. However, this maximum-likelihood (MLE) objective does not necessarily match a downstream use-case of autoregressively generating high-quality sequences. The MLE objective weights sequences proportionally to their frequency under the data distribution, with no guidance for the model's b… ▽ More

    Submitted 6 May, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: Poster, ICLR 2024

  41. arXiv:2306.03831  [pdf, other

    cs.LG cs.CV

    GEO-Bench: Toward Foundation Models for Earth Monitoring

    Authors: Alexandre Lacoste, Nils Lehmann, Pau Rodriguez, Evan David Sherwin, Hannah Kerner, Björn Lütjens, Jeremy Andrew Irvin, David Dao, Hamed Alemohammad, Alexandre Drouin, Mehmet Gunturkun, Gabriel Huang, David Vazquez, Dava Newman, Yoshua Bengio, Stefano Ermon, Xiao Xiang Zhu

    Abstract: Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to substantial increases in generalization to downstream tasks. Such models, recently coined foundation models, have been transformational to the field of natural language processing. Variants have also been proposed for image data, but their applicability to remote s… ▽ More

    Submitted 23 December, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: text overlap with arXiv:2112.00570

  42. arXiv:2306.00367  [pdf, other

    cs.LG cs.AI math.ST

    On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization

    Authors: Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji, Stefano Ermon

    Abstract: The emergence of various notions of ``consistency'' in diffusion models has garnered considerable attention and helped achieve improved sample quality, likelihood estimation, and accelerated sampling. Although similar concepts have been proposed in the literature, the precise relationships among them remain unclear. In this study, we establish theoretical connections between three recent ``consist… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  43. arXiv:2305.18290  [pdf, other

    cs.LG cs.AI cs.CL

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    Authors: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn

    Abstract: While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these prefere… ▽ More

    Submitted 13 December, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  44. arXiv:2305.17330  [pdf, other

    cs.AI cs.LG

    MADiff: Offline Multi-agent Learning with Diffusion Models

    Authors: Zhengbang Zhu, Minghuan Liu, Liyuan Mao, Bingyi Kang, Minkai Xu, Yong Yu, Stefano Ermon, Weinan Zhang

    Abstract: Diffusion model (DM) recently achieved huge success in various scenarios including offline reinforcement learning, where the diffusion planner learn to generate desired trajectories during online evaluations. However, despite the effectiveness in single-agent learning, it remains unclear how DMs can operate in multi-agent problems, where agents can hardly complete teamwork without good coordinatio… ▽ More

    Submitted 25 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: 19 pages, 10 figures, 7 tables. The first two authors contributed equally to the work

  45. arXiv:2305.16317  [pdf, other

    cs.LG cs.AI

    Parallel Sampling of Diffusion Models

    Authors: Andy Shih, Suneel Belkhale, Stefano Ermon, Dorsa Sadigh, Nima Anari

    Abstract: Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps, but these methods hurt sample quality. Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal app… ▽ More

    Submitted 15 October, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: 37th Conference on Neural Information Processing Systems

  46. arXiv:2305.11147  [pdf, other

    cs.CV cs.AI

    UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

    Authors: Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun Fu, Ran Xu

    Abstract: Achieving machine autonomy and human control often represent divergent objectives in the design of interactive AI systems. Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages. However, they often fall short in generating images with spatial, structural, or geometric controls. The integration of such… ▽ More

    Submitted 2 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  47. arXiv:2305.01140  [pdf, other

    cs.LG q-bio.QM

    Geometric Latent Diffusion Models for 3D Molecule Generation

    Authors: Minkai Xu, Alexander Powers, Ron Dror, Stefano Ermon, Jure Leskovec

    Abstract: Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries and advancing foundational science problems such as molecule design. Inspired by the recent huge success of Stable (latent) Diffusion models, we propose a novel and principled method for 3D molecule generation named Geometric Latent Diffusion Models (GeoLDM). GeoLDM is the f… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: Published at ICML 2023

  48. arXiv:2305.01118  [pdf, other

    cs.CV cs.AI cs.LG

    CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations

    Authors: Gengchen Mai, Ni Lao, Yutong He, Jiaming Song, Stefano Ermon

    Abstract: Geo-tagged images are publicly available in large quantities, whereas labels such as object classes are rather scarce and expensive to collect. Meanwhile, contrastive learning has achieved tremendous success in various natural image and language tasks with limited labeled data. However, existing methods fail to fully leverage geospatial information, which can be paramount to distinguishing objects… ▽ More

    Submitted 8 May, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: In: ICML 2023, Jul 23 - 29, 2023, Honolulu, Hawaii, USA

    MSC Class: 68T07; 68T45 ACM Class: I.2.10; I.5.4; I.5.1; J.2

  49. arXiv:2304.14621  [pdf, other

    cs.LG q-bio.BM

    MUDiff: Unified Diffusion for Complete Molecule Generation

    Authors: Chenqing Hua, Sitao Luan, Minkai Xu, Rex Ying, Jie Fu, Stefano Ermon, Doina Precup

    Abstract: Molecule generation is a very important practical problem, with uses in drug discovery and material design, and AI methods promise to provide useful solutions. However, existing methods for molecule generation focus either on 2D graph structure or on 3D geometric structure, which is not sufficient to represent a complete molecule as 2D graph captures mainly topology while 3D geometry captures main… ▽ More

    Submitted 5 February, 2024; v1 submitted 28 April, 2023; originally announced April 2023.

  50. arXiv:2304.04740  [pdf, other

    stat.ML cs.LG

    Reflected Diffusion Models

    Authors: Aaron Lou, Stefano Ermon

    Abstract: Score-based diffusion models learn to reverse a stochastic differential equation that maps data to noise. However, for complex tasks, numerical error can compound and result in highly unnatural samples. Previous work mitigates this drift with thresholding, which projects to the natural data domain (such as pixel space for images) after each diffusion step, but this leads to a mismatch between the… ▽ More

    Submitted 8 June, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: ICML 2023 Camera Ready. Code available at https://github.com/louaaron/Reflected-Diffusion