Skip to main content

Showing 1–50 of 129 results for author: Dai, A

  1. arXiv:2407.09502  [pdf, other

    cs.NE cs.AI

    From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models

    Authors: Eleni Nisioti, Claire Glanois, Elias Najarro, Andrew Dai, Elliot Meyerson, Joachim Winther Pedersen, Laetitia Teodorescu, Conor F. Hayes, Shyam Sudhakaran, Sebastian Risi

    Abstract: Large Language Models (LLMs) have taken the field of AI by storm, but their adoption in the field of Artificial Life (ALife) has been, so far, relatively reserved. In this work we investigate the potential synergies between LLMs and ALife, drawing on a large body of research in the two fields. We explore the potential of LLMs as tools for ALife research, for example, as operators for evolutionary… ▽ More

    Submitted 14 June, 2024; originally announced July 2024.

  2. arXiv:2406.16679  [pdf, other

    cs.RO

    Multi-Robot Collaborative Localization and Planning with Inter-Ranging

    Authors: Derek Knowles, Adam Dai, Grace Gao

    Abstract: Robots often use feature-based image tracking to identify their position in their surrounding environment; however, feature-based image tracking is prone to errors in low-textured and poorly lit environments. Specifically, we investigate a scenario where robots are tasked with exploring the surface of the Moon and are required to have an accurate estimate of their position to be able to correctly… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  3. arXiv:2406.09624  [pdf, other

    cs.LG cs.AI cs.CE physics.flu-dyn

    DrivAerNet++: A Large-Scale Multimodal Car Dataset with Computational Fluid Dynamics Simulations and Deep Learning Benchmarks

    Authors: Mohamed Elrefaie, Florin Morar, Angela Dai, Faez Ahmed

    Abstract: We present DrivAerNet++, the largest and most comprehensive multimodal dataset for aerodynamic car design. DrivAerNet++ comprises 8,000 diverse car designs modeled with high-fidelity computational fluid dynamics (CFD) simulations. The dataset includes diverse car configurations such as fastback, notchback, and estateback, with different underbody and wheel designs to represent both internal combus… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2406.02548  [pdf, other

    cs.CV

    Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

    Authors: Mohamed El Amine Boudjoghra, Angela Dai, Jean Lahoud, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Fahad Shahbaz Khan

    Abstract: Recent works on open-vocabulary 3D instance segmentation show strong promise, but at the cost of slow inference speed and high computation requirements. This high computation cost is typically due to their heavy reliance on 3D clip features, which require computationally expensive 2D foundation models like Segment Anything (SAM) and CLIP for multi-view aggregation into 3D. As a consequence, this h… ▽ More

    Submitted 20 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.15227  [pdf, other

    cs.RO

    Neural Elevation Models for Terrain Mapping and Path Planning

    Authors: Adam Dai, Shubh Gupta, Grace Gao

    Abstract: This work introduces Neural Elevations Models (NEMos), which adapt Neural Radiance Fields to a 2.5D continuous and differentiable terrain model. In contrast to traditional terrain representations such as digital elevation models, NEMos can be readily generated from imagery, a low-cost data source, and provide a lightweight representation of terrain through an implicit continuous and differentiable… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2405.11914  [pdf, other

    cs.CV

    PT43D: A Probabilistic Transformer for Generating 3D Shapes from Single Highly-Ambiguous RGB Images

    Authors: Yiheng Xiong, Angela Dai

    Abstract: Generating 3D shapes from single RGB images is essential in various applications such as robotics. Current approaches typically target images containing clear and complete visual descriptions of the object, without considering common realistic cases where observations of objects that are largely occluded or truncated. We thus propose a transformer-based autoregressive model to generate the probabi… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  7. Course Recommender Systems Need to Consider the Job Market

    Authors: Jibril Frej, Anna Dai, Syrielle Montariol, Antoine Bosselut, Tanja Käser

    Abstract: Current course recommender systems primarily leverage learner-course interactions, course content, learner preferences, and supplementary course details like instructor, institution, ratings, and reviews, to make their recommendation. However, these systems often overlook a critical aspect: the evolving skill demand of the job market. This paper focuses on the perspective of academic researchers,… ▽ More

    Submitted 1 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: accepted at SIGIR 2024 as a perspective paper. Camera Ready will come soon

    ACM Class: H.3.3

  8. arXiv:2404.07503  [pdf, other

    cs.CL

    Best Practices and Lessons Learned on Synthetic Data for Language Models

    Authors: Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

    Abstract: The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challeng… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  9. arXiv:2403.08055  [pdf, other

    cs.LG physics.flu-dyn

    DrivAerNet: A Parametric Car Dataset for Data-Driven Aerodynamic Design and Graph-Based Drag Prediction

    Authors: Mohamed Elrefaie, Angela Dai, Faez Ahmed

    Abstract: This study introduces DrivAerNet, a large-scale high-fidelity CFD dataset of 3D industry-standard car shapes, and RegDGCNN, a dynamic graph convolutional neural network model, both aimed at aerodynamic car design through machine learning. DrivAerNet, with its 4000 detailed 3D car meshes using 0.5 million surface mesh faces and comprehensive aerodynamic performance data comprising of full 3D pressu… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  10. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  11. arXiv:2402.03242  [pdf, other

    cs.CL

    JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching

    Authors: Antoine Magron, Anna Dai, Mike Zhang, Syrielle Montariol, Antoine Bosselut

    Abstract: Recent approaches in skill matching, employing synthetic training data for classification or similarity model training, have shown promising results, reducing the need for time-consuming and expensive annotations. However, previous synthetic datasets have limitations, such as featuring only one skill per sentence and generally comprising short sentences. In this paper, we introduce JobSkape, a fra… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Published at NLP4HR 2024 (EACL Workshop)

  12. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  13. arXiv:2312.08459  [pdf, other

    cs.CV cs.AI cs.GR cs.SD eess.AS

    FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models

    Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

    Abstract: We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from input audio signal. To capture the expressive, detailed nature of human heads, including hair, ears, and finer-scale eye movements, we propose to couple speech signal with the latent space of neural parametric head models to create high-fidelity, temporally coh… ▽ More

    Submitted 17 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Paper Video: https://youtu.be/7Jf0kawrA3Q Project Page: https://shivangi-aneja.github.io/projects/facetalk/

    Journal ref: CVPR 2024

  14. arXiv:2312.06134  [pdf, other

    cs.CL cs.LG

    Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

    Authors: Dami Choi, Derrick Xin, Hamid Dadkhahi, Justin Gilmer, Ankush Garg, Orhan Firat, Chih-Kuan Yeh, Andrew M. Dai, Behrooz Ghorbani

    Abstract: In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance. We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks. We provide a thorough empirical study and analysis of this method's be… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  15. arXiv:2312.02158  [pdf, other

    cs.CV cs.AI

    PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness

    Authors: Anh-Quan Cao, Angela Dai, Raoul de Charette

    Abstract: We propose the task of Panoptic Scene Completion (PSC) which extends the recently popular Semantic Scene Completion (SSC) task with instance-level information to produce a richer understanding of the 3D scene. Our PSC proposal utilizes a hybrid mask-based technique on the non-empty voxels from sparse multi-scale completions. Whereas the SSC literature overlooks uncertainty which is critical for ro… ▽ More

    Submitted 25 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Oral - Best paper award candidate. Project page: https://astra-vision.github.io/PaSCo

  16. arXiv:2312.01068  [pdf, other

    cs.CV

    DPHMs: Diffusion Parametric Head Models for Depth-based Tracking

    Authors: Jiapeng Tang, Angela Dai, Yinyu Nie, Lev Markhasin, Justus Thies, Matthias Niessner

    Abstract: We introduce Diffusion Parametric Head Models (DPHMs), a generative model that enables robust volumetric head reconstruction and tracking from monocular depth sequences. While recent volumetric head models, such as NPHMs, can now excel in representing high-fidelity head geometries, tracking and reconstructing heads from real-world single-view depth sequences remains very challenging, as the fittin… ▽ More

    Submitted 8 April, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: CVPR 2024; homepage: https://tangjiapeng.github.io/projects/DPHMs/

  17. arXiv:2311.18610  [pdf, other

    cs.CV

    DiffCAD: Weakly-Supervised Probabilistic CAD Model Retrieval and Alignment from an RGB Image

    Authors: Daoyi Gao, Dávid Rozenberszki, Stefan Leutenegger, Angela Dai

    Abstract: Perceiving 3D structures from RGB images based on CAD model primitives can enable an effective, efficient 3D object-based representation of scenes. However, current approaches rely on supervision from expensive annotations of CAD models associated with real images, and encounter challenges due to the inherent ambiguities in the task -- both in depth-scale ambiguity in monocular perception, as well… ▽ More

    Submitted 6 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: SIGGRAPH 2024, Project page: https://daoyig.github.io/DiffCAD/

  18. arXiv:2311.17737  [pdf, other

    cs.CV cs.GR

    GenZI: Zero-Shot 3D Human-Scene Interaction Generation

    Authors: Lei Li, Angela Dai

    Abstract: Can we synthesize 3D humans interacting with scenes without learning from any 3D human-scene interaction data? We propose GenZI, the first zero-shot approach to generating 3D human-scene interactions. Key to GenZI is our distillation of interaction priors from large vision-language models (VLMs), which have learned a rich semantic space of 2D human-scene compositions. Given a natural language desc… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Project page: https://craigleili.github.io/projects/genzi/ Video: https://youtu.be/ozfs6E0JIMY

  19. arXiv:2311.16097  [pdf, other

    cs.CV

    CG-HOI: Contact-Guided 3D Human-Object Interaction Generation

    Authors: Christian Diller, Angela Dai

    Abstract: We propose CG-HOI, the first method to address the task of generating dynamic 3D human-object interactions (HOIs) from text. We model the motion of both human and object in an interdependent fashion, as semantically rich human motion rarely happens in isolation without any interactions. Our key insight is that explicitly modeling contact between the human body surface and object geometry can be us… ▽ More

    Submitted 17 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Project page: https://cg-hoi.christian-diller.de Video: https://www.youtube.com/watch?v=GNyQwTwZ15s

    ACM Class: I.2.10; I.4.8; I.5.1; I.5.4

  20. arXiv:2311.15475  [pdf, other

    cs.CV cs.LG

    MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

    Authors: Yawar Siddiqui, Antonio Alliegro, Alexey Artemov, Tatiana Tommasi, Daniele Sirigatti, Vladislav Rosov, Angela Dai, Matthias Nießner

    Abstract: We introduce MeshGPT, a new approach for generating triangle meshes that reflects the compactness typical of artist-created meshes, in contrast to dense triangle meshes extracted by iso-surfacing methods from neural fields. Inspired by recent advances in powerful large language models, we adopt a sequence-based approach to autoregressively generate triangle meshes as sequences of triangles. We fir… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: Project Page: https://nihalsid.github.io/mesh-gpt/, Video: https://youtu.be/UV90O1_69_o

  21. arXiv:2310.13032  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Quality-Diversity through AI Feedback

    Authors: Herbie Bradley, Andrew Dai, Hannah Teufel, Jenny Zhang, Koen Oostermeijer, Marco Bellagente, Jeff Clune, Kenneth Stanley, Grégory Schott, Joel Lehman

    Abstract: In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algo… ▽ More

    Submitted 7 December, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: minor additions to supplementary results

  22. arXiv:2309.00862  [pdf, other

    cs.CV

    Big-model Driven Few-shot Continual Learning

    Authors: Ziqi Gu, Chunyan Xu, Zihan Lu, Xin Liu, Anbo Dai, Zhen Cui

    Abstract: Few-shot continual learning (FSCL) has attracted intensive attention and achieved some advances in recent years, but now it is difficult to again make a big stride in accuracy due to the limitation of only few-shot incremental samples. Inspired by distinctive human cognition ability in life learning, in this work, we propose a novel Big-model driven Few-shot Continual Learning (B-FSCL) framework t… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: 9 pages 6 figures

  23. arXiv:2308.11417  [pdf, other

    cs.CV

    ScanNet++: A High-Fidelity Dataset of 3D Indoor Scenes

    Authors: Chandan Yeshwanth, Yueh-Cheng Liu, Matthias Nießner, Angela Dai

    Abstract: We present ScanNet++, a large-scale dataset that couples together capture of high-quality and commodity-level geometry and color of indoor scenes. Each scene is captured with a high-end laser scanner at sub-millimeter resolution, along with registered 33-megapixel images from a DSLR camera, and RGB-D streams from an iPhone. Scene reconstructions are further annotated with an open vocabulary of sem… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: ICCV 2023. Video: https://youtu.be/E6P9e2r6M8I , Project page: https://cy94.github.io/scannetpp/

  24. arXiv:2308.09091  [pdf, other

    cs.CV

    Edit Temporal-Consistent Videos with Image Diffusion Model

    Authors: Yuanzhi Wang, Yong Li, Xiaoya Zhang, Xin Liu, Anbo Dai, Antoni B. Chan, Zhen Cui

    Abstract: Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing, yielding impressive zero-shot video editing performance. Nonetheless, the generated videos usually show spatial irregularities and temporal inconsistencies as the temporal characteristics of videos have not been faithfully modeled. In this paper, we propose an elegant yet effective Temporal-Consisten… ▽ More

    Submitted 29 December, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 10 pages, 7 figures

  25. arXiv:2308.08316  [pdf, other

    cs.CV

    Dual-Stream Diffusion Net for Text-to-Video Generation

    Authors: Binhui Liu, Xin Liu, Anbo Dai, Zhiyong Zeng, Dan Wang, Zhen Cui, Jian Yang

    Abstract: With the emerging diffusion models, recently, text-to-video generation has aroused increasing attention. But an important bottleneck therein is that generative videos often tend to carry some flickers and artifacts. In this work, we propose a dual-stream diffusion net (DSDN) to improve the consistency of content variations in generating videos. In particular, the designed two diffusion streams, vi… ▽ More

    Submitted 29 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 8pages, 7 figures

  26. arXiv:2307.04692  [pdf, other

    eess.SP cs.RO eess.SY

    Spoofing-Resilient LiDAR-GPS Factor Graph Localization with Chimera Authentication

    Authors: Adam Dai, Tara Minda, Ashwin Kanhere, Grace Gao

    Abstract: Many vehicle platforms typically use sensors such as LiDAR or camera for locally-referenced navigation with GPS for globally-referenced navigation. However, due to the unencrypted nature of GPS signals, all civilian users are vulner-able to spoofing attacks, where a malicious spoofer broadcasts fabricated signals and causes the user to track a false position fix. To protect against such GPS spoofi… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  27. arXiv:2306.00008  [pdf, other

    cs.LG cs.CL

    Brainformers: Trading Simplicity for Efficiency

    Authors: Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laudon, Jeff Dean

    Abstract: Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this in… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 May, 2023; originally announced June 2023.

  28. arXiv:2305.16960  [pdf, ps, other

    cs.CL cs.AI cs.CY cs.HC

    Training Socially Aligned Language Models on Simulated Social Interactions

    Authors: Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, Soroush Vosoughi

    Abstract: Social alignment in AI systems aims to ensure that these models behave according to established societal values. However, unlike humans, who derive consensus on value judgments through social interaction, current language models (LMs) are trained to rigidly replicate their training corpus in isolation, leading to subpar generalization in unfamiliar scenarios and vulnerability to adversarial attack… ▽ More

    Submitted 28 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Code, data, and models can be downloaded via https://github.com/agi-templar/Stable-Alignment

  29. arXiv:2305.15296  [pdf, other

    cs.CV cs.AI cs.LG

    MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

    Authors: Marco Bellagente, Manuel Brack, Hannah Teufel, Felix Friedrich, Björn Deiseroth, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Koen Oostermeijer, Andres Felipe Cruz-Salinas, Patrick Schramowski, Kristian Kersting, Samuel Weinbach

    Abstract: The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that all… ▽ More

    Submitted 20 December, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Proceedings of Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems (NeurIPS)

  30. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  31. arXiv:2305.04719  [pdf, other

    cs.CV

    Learning to Generate Poetic Chinese Landscape Painting with Calligraphy

    Authors: Shaozu Yuan, Aijun Dai, Zhiling Yan, Ruixue Liu, Meng Chen, Baoyang Chen, Zhijie Qiu, Xiaodong He

    Abstract: In this paper, we present a novel system (denoted as Polaca) to generate poetic Chinese landscape painting with calligraphy. Unlike previous single image-to-image painting generation, Polaca takes the classic poetry as input and outputs the artistic landscape painting image with the corresponding calligraphy. It is equipped with three different modules to complete the whole piece of landscape pain… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted by IJCAI 2022

  32. arXiv:2304.05868  [pdf, other

    cs.CV

    Mesh2Tex: Generating Mesh Textures from Image Queries

    Authors: Alexey Bokhovkin, Shubham Tulsiani, Angela Dai

    Abstract: Remarkable advances have been achieved recently in learning neural representations that characterize object geometry, while generating textured objects suitable for downstream applications and 3D rendering remains at an early stage. In particular, reconstructing textured geometry from images of real objects is a significant challenge -- reconstructed geometry is often inexact, making realistic tex… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: https://alexeybokhovkin.github.io/mesh2tex/

  33. arXiv:2303.18178  [pdf, other

    cs.CR cs.LG

    Robust and IP-Protecting Vertical Federated Learning against Unexpected Quitting of Parties

    Authors: Jingwei Sun, Zhixu Du, Anna Dai, Saleh Baghersalimi, Alireza Amirshahi, David Atienza, Yiran Chen

    Abstract: Vertical federated learning (VFL) enables a service provider (i.e., active party) who owns labeled features to collaborate with passive parties who possess auxiliary features to improve model performance. Existing VFL approaches, however, have two major vulnerabilities when passive parties unexpectedly quit in the deployment phase of VFL - severe performance degradation and intellectual property (… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  34. arXiv:2303.17015  [pdf, other

    cs.CV cs.LG

    HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion

    Authors: Ziya Erkoç, Fangchang Ma, Qi Shan, Matthias Nießner, Angela Dai

    Abstract: Implicit neural fields, typically encoded by a multilayer perceptron (MLP) that maps from coordinates (e.g., xyz) to signals (e.g., signed distances), have shown remarkable promise as a high-fidelity and compact representation. However, the lack of a regular and explicit grid structure also makes it challenging to apply generative modeling directly on implicit neural fields in order to synthesize… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: Project page: https://ziyaerkoc.com/hyperdiffusion/ Video: https://www.youtube.com/watch?v=wjFpsKdo-II

  35. arXiv:2303.16839  [pdf, other

    cs.CV cs.CL cs.LG

    MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

    Authors: Weicheng Kuo, AJ Piergiovanni, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale, Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova

    Abstract: The development of language models have moved from encoder-decoder to decoder-only designs. In addition, we observe that the two most popular multimodal tasks, the generative and contrastive tasks, are nontrivial to accommodate in one architecture, and further need adaptations for downstream tasks. We propose a novel paradigm of training with a decoder-only model for multimodal tasks, which is sur… ▽ More

    Submitted 9 August, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Published in Transactions on Machine Learning Research ( https://jmlr.org/tmlr/ ). 18 pages, 4 figures

  36. arXiv:2303.14541  [pdf, other

    cs.CV

    UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

    Authors: David Rozenberszki, Or Litany, Angela Dai

    Abstract: 3D instance segmentation is fundamental to geometric understanding of the world around us. Existing methods for instance segmentation of 3D scenes rely on supervision from expensive, manual 3D annotations. We propose UnScene3D, the first fully unsupervised 3D learning approach for class-agnostic 3D instance segmentation of indoor scans. UnScene3D first generates pseudo masks by leveraging self-sup… ▽ More

    Submitted 30 April, 2024; v1 submitted 25 March, 2023; originally announced March 2023.

    Comments: Project page: https://rozdavid.github.io/unscene3d, paper updated according to CVPR24 camera ready version

  37. arXiv:2303.14207  [pdf, other

    cs.CV

    DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

    Authors: Jiapeng Tang, Yinyu Nie, Lev Markhasin, Angela Dai, Justus Thies, Matthias Nießner

    Abstract: We present DiffuScene for indoor 3D scene synthesis based on a novel scene configuration denoising diffusion model. It generates 3D instance properties stored in an unordered object set and retrieves the most similar geometry for each object configuration, which is characterized as a concatenation of different attributes, including location, size, orientation, semantics, and geometry features. We… ▽ More

    Submitted 12 March, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2024

  38. arXiv:2302.14746  [pdf, other

    cs.CV

    Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

    Authors: Ji Hou, Xiaoliang Dai, Zijian He, Angela Dai, Matthias Nießner

    Abstract: Current popular backbones in computer vision, such as Vision Transformers (ViT) and ResNets are trained to perceive the world from 2D images. However, to more effectively understand 3D structural priors in 2D backbones, we propose Mask3D to leverage existing large-scale RGB-D data in a self-supervised pre-training to embed these 3D priors into 2D learned feature representations. In contrast to tra… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: accepted to CVPR2023

  39. arXiv:2302.08917  [pdf, other

    cs.CL cs.LG

    Massively Multilingual Shallow Fusion with Large Language Models

    Authors: Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai, Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman

    Abstract: While large language models (LLM) have made impressive progress in natural language processing, it remains unclear how to utilize them in improving automatic speech recognition (ASR). In this work, we propose to train a single multilingual language model (LM) for shallow fusion in multiple languages. We push the limits of the multilingual LM to cover up to 84 languages by scaling up using a mixtur… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted to IEEE ICASSP 2023

  40. arXiv:2212.09802  [pdf, other

    cs.CV cs.LG

    Panoptic Lifting for 3D Scene Understanding with Neural Fields

    Authors: Yawar Siddiqui, Lorenzo Porzi, Samuel Rota Buló, Norman Müller, Matthias Nießner, Angela Dai, Peter Kontschieder

    Abstract: We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with 3D-consistent panoptic segmentation from novel viewpoints. Unlike existing approaches which use 3D input directly or indirectly, our method requires only machine-generated 2D panoptic segmentation masks… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: Project Page: https://nihalsid.github.io/panoptic-lifting/, Video: https://youtu.be/QtsiL-6rSuM

  41. arXiv:2212.02936  [pdf, other

    cs.CV

    M-VADER: A Model for Diffusion with Multimodal Context

    Authors: Samuel Weinbach, Marco Bellagente, Constantin Eichenberg, Andrew Dai, Robert Baldock, Souradeep Nanda, Björn Deiseroth, Koen Oostermeijer, Hannah Teufel, Andres Felipe Cruz-Salinas

    Abstract: We introduce M-VADER: a diffusion model (DM) for image generation where the output can be specified using arbitrary combinations of images and text. We show how M-VADER enables the generation of images specified using combinations of image and text, and combinations of multiple images. Previously, a number of successful DM image generation algorithms have been introduced that make it possible to s… ▽ More

    Submitted 7 December, 2022; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: 22 pages, 14 figures, 2 tables, fixed figure 3

  42. arXiv:2212.01985  [pdf, other

    cs.CV cs.LG

    ObjectMatch: Robust Registration using Canonical Object Correspondences

    Authors: Can Gümeli, Angela Dai, Matthias Nießner

    Abstract: We present ObjectMatch, a semantic and object-centric camera pose estimator for RGB-D SLAM pipelines. Modern camera pose estimators rely on direct correspondences of overlapping regions between frames; however, they cannot align camera frames with little or no overlap. In this work, we propose to leverage indirect correspondences obtained via semantic object identification. For instance, when an o… ▽ More

    Submitted 24 March, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

    Comments: Project Page: http://cangumeli.github.io/ObjectMatch Video: https://www.youtube.com/watch?v=kuXoKVrzURk

  43. ClipFace: Text-guided Editing of Textured 3D Morphable Models

    Authors: Shivangi Aneja, Justus Thies, Angela Dai, Matthias Nießner

    Abstract: We propose ClipFace, a novel self-supervised approach for text-guided editing of textured 3D morphable model of faces. Specifically, we employ user-friendly language prompts to enable control of the expressions as well as appearance of 3D faces. We leverage the geometric expressiveness of 3D morphable models, which inherently possess limited controllability and texture expressivity, and develop a… ▽ More

    Submitted 24 April, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: Paper Video: https://youtu.be/toGOQqFuNmA Project website: https://shivangi-aneja.github.io/projects/clipface/

    Journal ref: SIGGRAPH 2023

  44. arXiv:2211.14309  [pdf, other

    cs.CV cs.LG

    FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations

    Authors: Christian Diller, Thomas Funkhouser, Angela Dai

    Abstract: We present a generative approach to forecast long-term future human behavior in 3D, requiring only weak supervision from readily available 2D human action data. This is a fundamental task enabling many downstream applications. The required ground-truth data is hard to capture in 3D (mocap suits, expensive setups) but easy to acquire in 2D (simple RGB cameras). Thus, we design our method to only re… ▽ More

    Submitted 17 May, 2024; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: Project Page: https://future-human-3d.christian-diller.de/ Video: https://www.youtube.com/watch?v=18du85YFXL0

    ACM Class: I.2.10; I.4.8; I.5.1; I.5.4

  45. arXiv:2211.14249  [pdf, other

    cs.CV

    Neural Poisson: Indicator Functions for Neural Fields

    Authors: Angela Dai, Matthias Nießner

    Abstract: Implicit neural field generating signed distance field representations (SDFs) of 3D shapes have shown remarkable progress in 3D shape reconstruction and generation. We introduce a new paradigm for neural field representations of 3D scenes; rather than characterizing surfaces as SDFs, we propose a Poisson-inspired characterization for surfaces as indicator functions optimized by neural fields. Cruc… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Video: https://youtu.be/swVsWp1-00c

  46. arXiv:2211.14157  [pdf, other

    cs.CV

    Learning 3D Scene Priors with 2D Supervision

    Authors: Yinyu Nie, Angela Dai, Xiaoguang Han, Matthias Nießner

    Abstract: Holistic 3D scene understanding entails estimation of both layout configuration and object geometry in a 3D environment. Recent works have shown advances in 3D scene estimation from various input modalities (e.g., images, 3D scans), by leveraging 3D supervision (e.g., 3D bounding boxes or CAD models), for which collection at scale is expensive and often intractable. To address this shortcoming, we… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Video: https://youtu.be/YT7MEdygRoY Project: https://yinyunie.github.io/sceneprior-page/

  47. arXiv:2210.11416  [pdf, other

    cs.LG cs.CL

    Scaling Instruction-Finetuned Language Models

    Authors: Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Alex Castro-Ros, Marie Pellat, Kevin Robinson, Dasha Valter, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang , et al. (10 additional authors not shown)

    Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects d… ▽ More

    Submitted 6 December, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: Public checkpoints: https://huggingface.co/docs/transformers/model_doc/flan-t5

  48. arXiv:2210.05359  [pdf, other

    cs.CL cs.AI

    Mind's Eye: Grounded Language Model Reasoning through Simulation

    Authors: Ruibo Liu, Jason Wei, Shixiang Shane Gu, Te-Yen Wu, Soroush Vosoughi, Claire Cui, Denny Zhou, Andrew M. Dai

    Abstract: Successful and effective communication between humans and AI relies on a shared experience of the world. By training solely on written text, current language models (LMs) miss the grounded experience of humans in the real-world -- their failure to relate language to the physical world causes knowledge to be misrepresented and obvious mistakes in their reasoning. We present Mind's Eye, a paradigm t… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

  49. arXiv:2209.08248  [pdf, other

    cs.RO

    PlaneSLAM: Plane-based LiDAR SLAM for Motion Planning in Structured 3D Environments

    Authors: Adam Dai, Greg Lund, Grace Gao

    Abstract: LiDAR sensors are a powerful tool for robot simultaneous localization and mapping (SLAM) in unknown environments, but the raw point clouds they produce are dense, computationally expensive to store, and unsuited for direct use by downstream autonomy tasks, such as motion planning. For integration with motion planning, it is desirable for SLAM pipelines to generate lightweight geometric map represe… ▽ More

    Submitted 29 September, 2022; v1 submitted 17 September, 2022; originally announced September 2022.

  50. arXiv:2206.04916  [pdf, other

    cs.CV

    PatchComplete: Learning Multi-Resolution Patch Priors for 3D Shape Completion on Unseen Categories

    Authors: Yuchen Rao, Yinyu Nie, Angela Dai

    Abstract: While 3D shape representations enable powerful reasoning in many visual and perception applications, learning 3D shape priors tends to be constrained to the specific categories trained on, leading to an inefficient learning process, particularly for general applications with unseen categories. Thus, we propose PatchComplete, which learns effective shape priors based on multi-resolution local patch… ▽ More

    Submitted 12 October, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Video link: https://www.youtube.com/watch?v=Ch1rvw2D_Kc ; Project page: https://yuchenrao.github.io/projects/patchComplete/patchComplete.html ; Accepted to NeurIPS'22