-
Language-driven Grasp Detection
Authors:
An Dinh Vuong,
Minh Nhat Vu,
Baoru Huang,
Nghia Nguyen,
Hieu Le,
Thieu Vo,
Anh Nguyen
Abstract:
Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samp…
▽ More
Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samples, over 3M objects, and upwards of 10M grasping instructions. We utilize foundation models to create a large-scale scene corpus with corresponding images and grasp prompts. We approach the language-driven grasp detection task as a conditional generation problem. Drawing on the success of diffusion models in generative tasks and given that language plays a vital role in this task, we propose a new language-driven grasp detection method based on diffusion models. Our key contribution is the contrastive training objective, which explicitly contributes to the denoising process to detect the grasp pose given the language instructions. We illustrate that our approach is theoretically supportive. The intensive experiments show that our method outperforms state-of-the-art approaches and allows real-world robotic grasping. Finally, we demonstrate our large-scale dataset enables zero-short grasp detection and is a challenging benchmark for future work. Project website: https://airvlab.github.io/grasp-anything/
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Evaluating the Efficacy of Large Language Models in Detecting Fake News: A Comparative Analysis
Authors:
Sahas Koka,
Anthony Vuong,
Anish Kataria
Abstract:
In an era increasingly influenced by artificial intelligence, the detection of fake news is crucial, especially in contexts like election seasons where misinformation can have significant societal impacts. This study evaluates the effectiveness of various LLMs in identifying and filtering fake news content. Utilizing a comparative analysis approach, we tested four large LLMs -- GPT-4, Claude 3 Son…
▽ More
In an era increasingly influenced by artificial intelligence, the detection of fake news is crucial, especially in contexts like election seasons where misinformation can have significant societal impacts. This study evaluates the effectiveness of various LLMs in identifying and filtering fake news content. Utilizing a comparative analysis approach, we tested four large LLMs -- GPT-4, Claude 3 Sonnet, Gemini Pro 1.0, and Mistral Large -- and two smaller LLMs -- Gemma 7B and Mistral 7B. By using fake news dataset samples from Kaggle, this research not only sheds light on the current capabilities and limitations of LLMs in fake news detection but also discusses the implications for developers and policymakers in enhancing AI-driven informational integrity.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Language-driven Scene Synthesis using Multi-conditional Diffusion Model
Authors:
An Vuong,
Minh Nhat Vu,
Toan Tien Nguyen,
Baoru Huang,
Dzung Nguyen,
Thieu Vo,
Anh Nguyen
Abstract:
Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which…
▽ More
Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which is a new task that integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Grasp-Anything: Large-scale Grasp Dataset from Foundation Models
Authors:
An Dinh Vuong,
Minh Nhat Vu,
Hieu Le,
Baoru Huang,
Binh Huynh,
Thieu Vo,
Andreas Kugi,
Anh Nguyen
Abstract:
Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately…
▽ More
Foundation models such as ChatGPT have made significant strides in robotic tasks due to their universal representation of real-world domains. In this paper, we leverage foundation models to tackle grasp detection, a persistent challenge in robotics with broad industrial applications. Despite numerous grasp datasets, their object diversity remains limited compared to real-world figures. Fortunately, foundation models possess an extensive repository of real-world knowledge, including objects we encounter in our daily lives. As a consequence, a promising solution to the limited representation in previous grasp datasets is to harness the universal knowledge embedded in these foundation models. We present Grasp-Anything, a new large-scale grasp dataset synthesized from foundation models to implement this solution. Grasp-Anything excels in diversity and magnitude, boasting 1M samples with text descriptions and more than 3M objects, surpassing prior datasets. Empirically, we show that Grasp-Anything successfully facilitates zero-shot grasp detection on vision-based tasks and real-world robotic experiments. Our dataset and code are available at https://grasp-anything-2023.github.io.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
HabiCrowd: A High Performance Simulator for Crowd-Aware Visual Navigation
Authors:
An Dinh Vuong,
Toan Tien Nguyen,
Minh Nhat VU,
Baoru Huang,
Dzung Nguyen,
Huynh Thi Thanh Binh,
Thieu Vo,
Anh Nguyen
Abstract:
Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics hav…
▽ More
Visual navigation, a foundational aspect of Embodied AI (E-AI), has been significantly studied in the past few years. While many 3D simulators have been introduced to support visual navigation tasks, scarcely works have been directed towards combining human dynamics, creating the gap between simulation and real-world applications. Furthermore, current 3D simulators incorporating human dynamics have several limitations, particularly in terms of computational efficiency, which is a promise of E-AI simulators. To overcome these shortcomings, we introduce HabiCrowd, the first standard benchmark for crowd-aware visual navigation that integrates a crowd dynamics model with diverse human settings into photorealistic environments. Empirical evaluations demonstrate that our proposed human dynamics model achieves state-of-the-art performance in collision avoidance, while exhibiting superior computational efficiency compared to its counterparts. We leverage HabiCrowd to conduct several comprehensive studies on crowd-aware visual navigation tasks and human-robot interactions. The source code and data can be found at https://habicrowd.github.io/.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Open-Vocabulary Affordance Detection in 3D Point Clouds
Authors:
Toan Nguyen,
Minh Nhat Vu,
An Vuong,
Dzung Nguyen,
Thieu Vo,
Ngan Le,
Anh Nguyen
Abstract:
Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of dete…
▽ More
Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can be able to detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100ms). Our project is available at https://openad2023.github.io.
△ Less
Submitted 23 July, 2023; v1 submitted 4 March, 2023;
originally announced March 2023.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
A consistent approach for fluid-structure-contact interaction based on a porous flow model for rough surface contact
Authors:
Christoph Ager,
Benedikt Schott,
Anh-Tu Vuong,
Alexander Popp,
Wolfgang A. Wall
Abstract:
Simulation approaches for fluid-structure-contact interaction, especially if requested to be consistent even down to the real contact scenarios, belong to the most challenging and still unsolved problems in computational mechanics. The main challenges are twofold - one is to have a correct physical model for this scenario, and the other one is to have a numerical method that is capable of working…
▽ More
Simulation approaches for fluid-structure-contact interaction, especially if requested to be consistent even down to the real contact scenarios, belong to the most challenging and still unsolved problems in computational mechanics. The main challenges are twofold - one is to have a correct physical model for this scenario, and the other one is to have a numerical method that is capable of working and being consistent down to a zero gap. And when analyzing such challenging setups of fluid-structure interaction that include contact of submersed solid components it gets obvious that the influence of surface roughness effects is essential for a physical consistent modeling of such configurations. To capture this system behavior, we present a continuum mechanical model which is able to include the effects of the surface microstructure in a fluid-structure-contact interaction framework. An averaged representation for the mixture of fluid and solid on the rough surfaces, which is of major interest for the macroscopic response of such a system, is introduced therein. The inherent coupling of the macroscopic fluid flow and the flow inside the rough surfaces, the stress exchange of all contacting solid bodies involved, and the interaction between fluid and solid is included in the construction of the model. Although the physical model is not restricted to finite element based methods, a numerical approach with its core based on the Cut Finite Element Method (CutFEM), enabling topological changes of the fluid domain to solve the presented model numerically, is introduced. Such a CutFEM based approach is able to deal with the numerical challenges mentioned above. Different test cases give a perspective towards the potential capabilities of the presented physical model and numerical approach.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Software Engineering Solutions To Support Vertical Transportation
Authors:
Alber J. Christianto,
Peng Chen,
Osheen Walawedura,
Annie Vuong,
Jun Feng,
Dong Wang,
Maria Spichkova
Abstract:
In this paper we introduce the core results of the project on visualisation and analysis of data collected from the vertical transport facilities. The aim of the project was to provide better user experience as well as to help building maintenance staff to increase productivity of their work. We elaborated a web-based system for vertical transportation, to cover the needs of (1) staff working on b…
▽ More
In this paper we introduce the core results of the project on visualisation and analysis of data collected from the vertical transport facilities. The aim of the project was to provide better user experience as well as to help building maintenance staff to increase productivity of their work. We elaborated a web-based system for vertical transportation, to cover the needs of (1) staff working on building maintenance, (2) people who are regularly using the facilities in the corresponding buildings.
△ Less
Submitted 13 December, 2017;
originally announced December 2017.