Skip to main content

Showing 1–50 of 388 results for author: Xia, F

  1. arXiv:2407.09777  [pdf, other

    cs.LG cs.AI

    Graph Transformers: A Survey

    Authors: Ahsan Shehzad, Feng Xia, Shagufta Abid, Ciyuan Peng, Shuo Yu, Dongyu Zhang, Karin Verspoor

    Abstract: Graph transformers are a recent advancement in machine learning, offering a new class of neural network models for graph-structured data. The synergy between transformers and graph learning demonstrates strong performance and versatility across various graph-related tasks. This survey provides an in-depth review of recent progress and challenges in graph transformer research. We begin with foundat… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 23 pages, 4 figures

    MSC Class: 68T07; 68T05; 68U01 ACM Class: I.2.6

  2. arXiv:2407.08931  [pdf, other

    cs.CV

    Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection

    Authors: Xingyu Peng, Yan Bai, Chen Gao, Lirong Yang, Fei Xia, Beipeng Mu, Xiaofei Wang, Si Liu

    Abstract: Open-Vocabulary Detection (OVD) is the task of detecting all interesting objects in a given scene without predefined object classes. Extensive work has been done to deal with the OVD for 2D RGB images, but the exploration of 3D OVD is still limited. Intuitively, lidar point clouds provide 3D information, both object level and scene level, to generate trustful detection results. However, previous l… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: accepted by ECCV 2024

  3. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  4. arXiv:2407.06649  [pdf, ps, other

    cs.SC math.AC

    On the equivalence problem of Smith forms for multivariate polynomial matrices

    Authors: Dong Lu, Dingkang Wang, Fanghui Xiao, Xiaopeng Zheng

    Abstract: This paper delves into the equivalence problem of Smith forms for multivariate polynomial matrices. Generally speaking, multivariate ($n \geq 2$) polynomial matrices and their Smith forms may not be equivalent. However, under certain specific condition, we derive the necessary and sufficient condition for their equivalence. Let $F\in K[x_1,\ldots,x_n]^{l\times m}$ be of rank $r$,… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  5. arXiv:2407.04936  [pdf, other

    cs.SD eess.AS

    A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining

    Authors: Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Xubo Liu, Wenbo Wang, Shuhan Qi, Kejia Zhang, Jianyuan Sun, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, the SDR-based metrics require a reference signal, which is often difficult to obtain in real-world scenarios. In addition, with the SDR-based metrics,… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE 2024 Workshop

  6. arXiv:2407.01887  [pdf, other

    cs.LG cs.AI cs.CL

    Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents

    Authors: Fanzeng Xia, Hao Liu, Yisong Yue, Tongxin Li

    Abstract: In-context decision-making is an important capability of artificial general intelligence, which Large Language Models (LLMs) have effectively demonstrated in various scenarios. However, LLMs often face challenges when dealing with numerical contexts, and limited attention has been paid to evaluating their performance through preference feedback generated by the environment. This paper investigates… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2406.17739  [pdf, other

    cs.CL cs.AI

    Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model

    Authors: Fei Xia, Yixuan Weng, Shizhu He, Kang Liu, Jun Zhao

    Abstract: Taxonomies, which organize domain concepts into hierarchical structures, are crucial for building knowledge systems and downstream applications. As domain knowledge evolves, taxonomies need to be continuously updated to include new concepts. Previous approaches have mainly focused on adding concepts to the leaf nodes of the existing hierarchical tree, which does not fully utilize the taxonomy's kn… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.14132  [pdf, other

    cs.AI

    Enhancing Monotonic Modeling with Spatio-Temporal Adaptive Awareness in Diverse Marketing

    Authors: Bin Li, Jiayan Pei, Feiyang Xiao, Yifan Zhao, Zhixing Zhang, Diwei Liu, HengXu He, Jia Jia

    Abstract: In the mobile internet era, the Online Food Ordering Service (OFOS) emerges as an integral component of inclusive finance owing to the convenience it brings to people. OFOS platforms offer dynamic allocation incentives to users and merchants through diverse marketing campaigns to encourage payments while maintaining the platforms' budget efficiency. Despite significant progress, the marketing doma… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 7 pages

  9. arXiv:2406.13626  [pdf, other

    cs.CL cs.AI

    Fine-Tuning Gemma-7B for Enhanced Sentiment Analysis of Financial News Headlines

    Authors: Kangtong Mo, Wenyan Liu, Xuanzhen Xu, Chang Yu, Yuelin Zou, Fangqing Xia

    Abstract: In this study, we explore the application of sentiment analysis on financial news headlines to understand investor sentiment. By leveraging Natural Language Processing (NLP) and Large Language Models (LLM), we analyze sentiment from the perspective of retail investors. The FinancialPhraseBank dataset, which contains categorized sentiments of financial news headlines, serves as the basis for our an… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.12501  [pdf, other

    cs.IR

    Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback

    Authors: Guipeng Xv, Xinyu Li, Ruobing Xie, Chen Lin, Chong Liu, Feng Xia, Zhanhui Kang, Leyu Lin

    Abstract: Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommende… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  11. arXiv:2406.11138  [pdf, other

    cs.CV cs.AI

    Diffusion Models in Low-Level Vision: A Survey

    Authors: Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, Xiu Li

    Abstract: Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compellin… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 20 pages, 23 figures, 4 tables

  12. arXiv:2406.07966  [pdf, other

    cs.CV

    Real-world Image Dehazing with Coherence-based Label Generator and Cooperative Unfolding Network

    Authors: Chengyu Fang, Chunming He, Fengyang Xiao, Yulun Zhang, Longxiang Tang, Yuelin Zhang, Kai Li, Xiu Li

    Abstract: Real-world Image Dehazing (RID) aims to alleviate haze-induced degradation in real-world settings. This task remains challenging due to the complexities in accurately modeling real haze distributions and the scarcity of paired real-world data. To address these challenges, we first introduce a cooperative unfolding network that jointly models atmospheric scattering and image scenes, effectively int… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures, 6 tables

  13. arXiv:2406.06618  [pdf, other

    cs.SI cs.AI cs.CY cs.LG physics.soc-ph

    PANDORA: Deep graph learning based COVID-19 infection risk level forecasting

    Authors: Shuo Yu, Feng Xia, Yueru Wang, Shihao Li, Falih Febrinanto, Madhu Chetty

    Abstract: COVID-19 as a global pandemic causes a massive disruption to social stability that threatens human life and the economy. Policymakers and all elements of society must deliver measurable actions based on the pandemic's severity to minimize the detrimental impact of COVID-19. A proper forecasting system is arguably important to provide an early signal of the risk of COVID-19 infection so that the au… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  14. arXiv:2406.06617  [pdf, other

    cs.SI cs.LG

    Collaborative Team Recognition: A Core Plus Extension Structure

    Authors: Shuo Yu, Fayez Alqahtani, Amr Tolba, Ivan Lee, Tao Jia, Feng Xia

    Abstract: Scientific collaboration is a significant behavior in knowledge creation and idea exchange. To tackle large and complex research questions, a trend of team formation has been observed in recent decades. In this study, we focus on recognizing collaborative teams and exploring inner patterns using scholarly big graph data. We propose a collaborative team recognition (CORE) model with a "core + exten… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  15. arXiv:2406.04702  [pdf, other

    cs.LG

    Marking the Pace: A Blockchain-Enhanced Privacy-Traceable Strategy for Federated Recommender Systems

    Authors: Zhen Cai, Tao Tang, Shuo Yu, Yunpeng Xiao, Feng Xia

    Abstract: Federated recommender systems have been crucially enhanced through data sharing and continuous model updates, attributed to the pervasive connectivity and distributed computing capabilities of Internet of Things (IoT) devices. Given the sensitivity of IoT data, transparent data processing in data sharing and model updates is paramount. However, existing methods fall short in tracing the flow of sh… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  16. arXiv:2406.04690  [pdf, other

    cs.LG stat.ML

    Higher-order Structure Based Anomaly Detection on Attributed Networks

    Authors: Xu Yuan, Na Zhou, Shuo Yu, Huafei Huang, Zhikui Chen, Feng Xia

    Abstract: Anomaly detection (such as telecom fraud detection and medical image detection) has attracted the increasing attention of people. The complex interaction between multiple entities widely exists in the network, which can reflect specific human behavior patterns. Such patterns can be modeled by higher-order network structures, thus benefiting anomaly detection on attributed networks. However, due to… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  17. arXiv:2405.17034  [pdf, other

    cs.LG cs.AI

    FUGNN: Harmonizing Fairness and Utility in Graph Neural Networks

    Authors: Renqiang Luo, Huafei Huang, Shuo Yu, Zhuoyang Han, Estrid He, Xiuzhen Zhang, Feng Xia

    Abstract: Fairness-aware Graph Neural Networks (GNNs) often face a challenging trade-off, where prioritizing fairness may require compromising utility. In this work, we re-examine fairness through the lens of spectral graph theory, aiming to reconcile fairness and utility within the framework of spectral graph learning. We explore the correlation between sensitive features and spectrum in GNNs, using theore… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  18. arXiv:2405.16021  [pdf, other

    cs.RO

    VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

    Authors: Michael Ahn, Montserrat Gonzalez Arenas, Matthew Bennice, Noah Brown, Christine Chan, Byron David, Anthony Francis, Gavin Gonzalez, Rainer Hessmer, Tomas Jackson, Nikhil J Joshi, Daniel Lam, Tsang-Wei Edward Lee, Alex Luong, Sharath Maddineni, Harsh Patel, Jodilyn Peralta, Jornell Quiambao, Diego Reyes, Rosario M Jauregui Ruano, Dorsa Sadigh, Pannag Sanketi, Leila Takayama, Pavel Vodenski, Fei Xia

    Abstract: Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon ta… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  19. arXiv:2405.14156  [pdf, other

    cs.CV

    Unveiling the Tapestry of Consistency in Large Vision-Language Models

    Authors: Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo

    Abstract: Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in different sizes of solution spaces, LVLMs fail to always give consistent answers regarding the same knowledge point. This inconsistency of answers between different solution spaces is prevalent in LVLMs an… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: This project is available at https://github.com/foundation-multimodal-models/ConBench

  20. arXiv:2405.09543  [pdf, other

    cs.CY cs.AI cs.IR cs.LG

    Algorithmic Fairness: A Tolerance Perspective

    Authors: Renqiang Luo, Tao Tang, Feng Xia, Jiaying Liu, Chengpei Xu, Leo Yu Zhang, Wei Xiang, Chengqi Zhang

    Abstract: Recent advancements in machine learning and deep learning have brought algorithmic fairness into sharp focus, illuminating concerns over discriminatory decision making that negatively impacts certain individuals or groups. These concerns have manifested in legal, ethical, and societal challenges, including the erosion of trust in intelligent systems. In response, this survey delves into the existi… ▽ More

    Submitted 26 April, 2024; originally announced May 2024.

    Comments: 33 pages, 4 figures

    MSC Class: 68T01; 68W40 ACM Class: I.2.6; K.4.2; H.1.2

  21. arXiv:2405.04101  [pdf, other

    cs.LG cs.AI

    Continual Learning in the Presence of Repetition

    Authors: Hamed Hemati, Lorenzo Pellegrini, Xiaotian Duan, Zixuan Zhao, Fangfang Xia, Marc Masana, Benedikt Tscheschner, Eduardo Veas, Yuxiang Zheng, Shiji Zhao, Shao-Yuan Li, Sheng-Jun Huang, Vincenzo Lomonaco, Gido M. van de Ven

    Abstract: Continual learning (CL) provides a framework for training models in ever-evolving environments. Although re-occurrence of previously seen objects or tasks is common in real-world problems, the concept of repetition in the data stream is not often considered in standard benchmarks for CL. Unlike with the rehearsal mechanism in buffer-based strategies, where sample repetition is controlled by the st… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Preprint; Challenge Report of the 4th Workshop on Continual Learning in Computer Vision at CVPR

  22. arXiv:2405.04029  [pdf, other

    cs.CR

    Enabling Privacy-Preserving and Publicly Auditable Federated Learning

    Authors: Huang Zeng, Anjia Yang, Jian Weng, Min-Rong Chen, Fengjun Xiao, Yi Liu, Ye Yao

    Abstract: Federated learning (FL) has attracted widespread attention because it supports the joint training of models by multiple participants without moving private dataset. However, there are still many security issues in FL that deserve discussion. In this paper, we consider three major issues: 1) how to ensure that the training process can be publicly audited by any third party; 2) how to avoid the infl… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: ICC 2024 - 2024 IEEE International Conference on Communications Conference Program

    ACM Class: C.2.2; C.2.4; E.3

  23. arXiv:2405.01882  [pdf, other

    cs.RO cs.AI eess.SP

    Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot

    Authors: Zhanzhong Gu, Xiangjian He, Gengfa Fang, Chengpei Xu, Feng Xia, Wenjing Jia

    Abstract: Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  24. arXiv:2405.00266  [pdf, other

    cs.NI

    Robot-As-A-Sensor: Forming a Sensing Network with Robots for Underground Mining Missions

    Authors: Xiaoyu Ai, Chengpei Xu, Binghao Li, Feng Xia

    Abstract: Nowadays, robots are deployed as mobile platforms equipped with sensing, communication and computing capabilities, especially in the mining industry, where they perform tasks in hazardous and repetitive environments. Despite their potential, individual robots face significant limitations when completing complex tasks that require the collaboration of multiple robots. This collaboration requires a… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Submitted to Special Issue on Neuro-Inspired Learning for Robotics for IEEE Transactions on Cognitive and Developmental Systems

  25. arXiv:2404.17169  [pdf, other

    cs.LG cs.CY

    FairGT: A Fairness-aware Graph Transformer

    Authors: Renqiang Luo, Huafei Huang, Shuo Yu, Xiuzhen Zhang, Feng Xia

    Abstract: The design of Graph Transformers (GTs) generally neglects considerations for fairness, resulting in biased outcomes against certain sensitive subgroups. Since GTs encode graph information without relying on message-passing mechanisms, conventional fairness-aware graph learning methods cannot be directly applicable to address these issues. To tackle this challenge, we propose FairGT, a Fairness-awa… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: IJCAI2024

  26. arXiv:2404.08965  [pdf, other

    cs.CV cs.MM

    Seeing Text in the Dark: Algorithm and Benchmark

    Authors: Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang

    Abstract: Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for l… ▽ More

    Submitted 23 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  27. arXiv:2404.06645  [pdf, other

    cs.RO cs.AI

    GenCHiP: Generating Robot Policy Code for High-Precision and Contact-Rich Manipulation Tasks

    Authors: Kaylee Burns, Ajinkya Jain, Keegan Go, Fei Xia, Michael Stark, Stefan Schaal, Karol Hausman

    Abstract: Large Language Models (LLMs) have been successful at generating robot policy code, but so far these results have been limited to high-level tasks that do not require precise movement. It is an open question how well such approaches work for tasks that require reasoning over contact forces and working within tight success tolerances. We find that, with the right action space, LLMs are capable of su… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 14 pages, 12 figures

    ACM Class: I.2.9

  28. arXiv:2404.00826  [pdf, other

    cs.CL

    Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods

    Authors: Yujuan Fu, Giridhar Kaushik Ramachandran, Nicholas J Dobbins, Namu Park, Michael Leu, Abby R. Rosenberg, Kevin Lybarger, Fei Xia, Ozlem Uzuner, Meliha Yetisgen

    Abstract: Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus… ▽ More

    Submitted 4 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: 12 pages, 2 figures and 3 tables. Accepted by LREC-COLING 2024

  29. arXiv:2403.16519  [pdf, ps, other

    cs.SC

    Two Algorithms for Computing Rational Univariate Representations of Zero-Dimensional Ideals with Parameters

    Authors: Dingkang Wang, Jingjing Wei, Fanghui Xiao, Xiaopeng Zheng

    Abstract: Two algorithms for computing the rational univariate representation of zero-dimensional ideals with parameters are presented in the paper. Different from the rational univariate representation of zero-dimensional ideals without parameters, the number of zeros of zero-dimensional ideals with parameters under various specializations is different, which leads to choosing and checking the separating e… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  30. arXiv:2403.15637  [pdf, other

    cs.RO

    CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

    Authors: Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Mohamed Elnoor, Anuj Zore, Brian Ichter, Fei Xia, Jie Tan, Wenhao Yu, Dinesh Manocha

    Abstract: We present ConVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to identify the context or scenario (e.g., indoor corridor, outdoor terrain, crosswalk, etc) of the robot's surroundings, and formulate context-based naviga… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 9 pages, 4 figures

  31. arXiv:2403.11806  [pdf, other

    cs.IT eess.SP

    Fluid Antenna for Mobile Edge Computing

    Authors: Yiping Zuo, Jiajia Guo, Biyun Sheng, Chen Dai, Fu Xiao, Shi Jin

    Abstract: In the evolving environment of mobile edge computing (MEC), optimizing system performance to meet the growing demand for low-latency computing services is a top priority. Integrating fluidic antenna (FA) technology into MEC networks provides a new approach to address this challenge. This letter proposes an FA-enabled MEC scheme that aims to minimize the total system delay by leveraging the mobilit… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  32. arXiv:2403.10815  [pdf, other

    eess.IV cs.CV

    MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections

    Authors: Mude Hui, Zihao Wei, Hongru Zhu, Fei Xia, Yuyin Zhou

    Abstract: Volumetric optical microscopy using non-diffracting beams enables rapid imaging of 3D volumes by projecting them axially to 2D images but lacks crucial depth information. Addressing this, we introduce MicroDiffusion, a pioneering tool facilitating high-quality, depth-resolved 3D volume reconstruction from limited 2D projections. While existing Implicit Neural Representation (INR) models often yiel… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  33. arXiv:2403.09227  [pdf, other

    cs.RO cs.AI

    BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation

    Authors: Chengshu Li, Ruohan Zhang, Josiah Wong, Cem Gokmen, Sanjana Srivastava, Roberto Martín-Martín, Chen Wang, Gabrael Levine, Wensi Ai, Benjamin Martinez, Hang Yin, Michael Lingelbach, Minjune Hwang, Ayano Hiranaka, Sujay Garlanka, Arman Aydin, Sharon Lee, Jiankai Sun, Mona Anvari, Manasi Sharma, Dhruva Bansal, Samuel Hunter, Kyu-Young Kim, Alan Lou, Caleb R Matthews , et al. (10 additional authors not shown)

    Abstract: We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: A preliminary version was published at 6th Conference on Robot Learning (CoRL 2022)

  34. arXiv:2403.08310  [pdf, other

    cs.CV

    StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields

    Authors: Hongbin Xu, Weitao Chen, Feng Xiao, Baigui Sun, Wenxiong Kang

    Abstract: 4D style transfer aims at transferring arbitrary visual style to the synthesized novel views of a dynamic 4D scene with varying viewpoints and times. Existing efforts on 3D style transfer can effectively combine the visual features of style images and neural radiance fields (NeRF) but fail to handle the 4D dynamic scenes limited by the static scene assumption. Consequently, we aim to handle the no… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: In submission. The code and model are released at: https://github.com/ToughStoneX/StyleDyRF

  35. arXiv:2403.08182  [pdf, other

    cs.CV

    SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention

    Authors: Feng Xiao, Hongbin Xu, Qiuxia Wu, Wenxiong Kang

    Abstract: 3D visual grounding aims to automatically locate the 3D region of the specified object given the corresponding textual description. Existing works fail to distinguish similar objects especially when multiple referred objects are involved in the description. Experiments show that direct matching of language and visual modal has limited capacity to comprehend complex referential relationships in utt… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  36. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  37. arXiv:2402.17489  [pdf, other

    cs.AR

    SSRESF: Sensitivity-aware Single-particle Radiation Effects Simulation Framework in SoC Platforms based on SVM Algorithm

    Authors: Meng Liu, Shuai Li, Fei Xiao, Ruijie Wang, Chunxue Liu, Liang Wang

    Abstract: The ever-expanding scale of integrated circuits has brought about a significant rise in the design risks associated with radiation-resistant integrated circuit chips. Traditional single-particle experimental methods, with their iterative design approach, are increasingly ill-suited for the challenges posed by large-scale integrated circuits. In response, this article introduces a novel sensitivity… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to the 61th ACM/IEEE Design Automation conference (DAC 2024)

  38. arXiv:2402.14254  [pdf, other

    cs.LG stat.ML

    A hierarchical decomposition for explaining ML performance discrepancies

    Authors: Jean Feng, Harvineet Singh, Fan Xia, Adarsh Subbaswamy, Alexej Gossmann

    Abstract: Machine learning (ML) algorithms can often differ in performance across domains. Understanding $\textit{why}$ their performance differs is crucial for determining what types of interventions (e.g., algorithmic or operational) are most effective at closing the performance gaps. Existing methods focus on $\textit{aggregate decompositions}$ of the total performance gap into the impact of a shift in t… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 11 pages, 5 figures in main body; 14 pages and 2 figures in appendices

  39. arXiv:2402.11450  [pdf, other

    cs.RO

    Learning to Learn Faster from Human Feedback with Language Model Predictive Control

    Authors: Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore , et al. (25 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o… ▽ More

    Submitted 31 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  40. arXiv:2402.07872  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

    Authors: Soroush Nasiriany, Fei Xia, Wenhao Yu, Ted Xiao, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter

    Abstract: Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce only textual outputs, while robotic control and other spatial tasks require outputting continuous coordinates, actions, or trajectories. How can we ena… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  41. arXiv:2402.06107  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Multiple Instance Learning for Cheating Detection and Localization in Online Examinations

    Authors: Yemeng Liu, Jing Ren, Jianshuo Xu, Xiaomei Bai, Roopdeep Kaur, Feng Xia

    Abstract: The spread of the Coronavirus disease-2019 epidemic has caused many courses and exams to be conducted online. The cheating behavior detection model in examination invigilation systems plays a pivotal role in guaranteeing the equality of long-distance examinations. However, cheating behavior is rare, and most researchers do not comprehensively take into account features such as head posture, gaze a… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 12 pages, 7 figures

    MSC Class: 68T40; 68T45 ACM Class: I.2.10; I.5.4

    Journal ref: IEEE Transactions on Cognitive and Developmental Systems 2024

  42. arXiv:2402.05322  [pdf, other

    cs.LG cs.AI cs.GR cs.SI

    Learning on Multimodal Graphs: A Survey

    Authors: Ciyuan Peng, Jiayuan He, Feng Xia

    Abstract: Multimodal data pervades various domains, including healthcare, social media, and transportation, where multimodal graphs play a pivotal role. Machine learning on multimodal graphs, referred to as multimodal graph learning (MGL), is essential for successful artificial intelligence (AI) applications. The burgeoning research in this field encompasses diverse graph data types and modalities, learning… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 9 pages, 1 figure

  43. arXiv:2402.04031  [pdf

    cs.CV cs.LG

    Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation

    Authors: Zolnamar Dorjsembe, Hsing-Kuo Pao, Furen Xiao

    Abstract: This study introduces Polyp-DDPM, a diffusion-based method for generating realistic images of polyps conditioned on masks, aimed at enhancing the segmentation of gastrointestinal (GI) tract polyps. Our approach addresses the challenges of data limitations, high annotation costs, and privacy concerns associated with medical images. By conditioning the diffusion model on segmentation masks-binary ma… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  44. Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach

    Authors: Xin Chen, Mingliang Hou, Tao Tang, Achhardeep Kaur, Feng Xia

    Abstract: With the arrival of the big data era, mobility profiling has become a viable method of utilizing enormous amounts of mobility data to create an intelligent transportation system. Mobility profiling can extract potential patterns in urban traffic from mobility data and is critical for a variety of traffic-related applications. However, due to the high level of complexity and the huge amount of data… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 10 pages, 7 figures

    MSC Class: 68T09; 68T30; 68U35 ACM Class: I.2.6; I.2.4; H.1.2

    Journal ref: The 7th IEEE International Conference on Data Science and Systems (DSS), Dec 20 - 22, 2021, Haikou, China

  45. arXiv:2402.03732  [pdf, other

    cs.AI cs.CL cs.DL cs.LG

    Deep Outdated Fact Detection in Knowledge Graphs

    Authors: Huiling Tu, Shuo Yu, Vidya Saikrishna, Feng Xia, Karin Verspoor

    Abstract: Knowledge graphs (KGs) have garnered significant attention for their vast potential across diverse domains. However, the issue of outdated facts poses a challenge to KGs, affecting their overall quality as real-world information evolves. Existing solutions for outdated fact detection often rely on manual recognition. In response, this paper presents DEAN (Deep outdatEd fAct detectioN), a novel dee… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 10 pages, 6 figures

    MSC Class: 68T09; 68T30; 68P20 ACM Class: I.2.6; I.2.4; H.3.7; H.3.3

    Journal ref: 2023 IEEE International Conference on Data Mining Workshops (ICDMW), December 1-4, 2023, Shanghai, China

  46. Generative Expressive Robot Behaviors using Large Language Models

    Authors: Karthik Mahadevan, Jonathan Chien, Noah Brown, Zhuo Xu, Carolina Parada, Fei Xia, Andy Zeng, Leila Takayama, Dorsa Sadigh

    Abstract: People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalitie… ▽ More

    Submitted 30 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

  47. arXiv:2401.12963  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

    Authors: Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Isabel Leal, Edward Lee, Sergey Levine, Yao Lu, Isabel Leal, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao , et al. (3 additional authors not shown)

    Abstract: Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is the lack of data grounded in the physical world. In this paper, we propose AutoRT, a system that leverages existing foundation models to scale up the d… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: 26 pages, 9 figures, ICRA 2024 VLMNM Workshop

  48. arXiv:2401.12486  [pdf, ps, other

    cs.IT

    Quaternary codes and their binary images

    Authors: Yansheng Wu, Chao Li, Lin Zhang, Fu Xiao

    Abstract: Recently, simplicial complexes are used in constructions of several infinite families of minimal and optimal linear codes by Hyun {\em et al.} Building upon their research, in this paper more linear codes over the ring $\mathbb{Z}_4$ are constructed by simplicial complexes. Specifically, the Lee weight distributions of the resulting quaternary codes are determined and two infinite families of four… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 21 pages

  49. arXiv:2401.12168  [pdf, other

    cs.CV cs.CL cs.LG cs.RO

    SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

    Authors: Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia

    Abstract: Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While Vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still lack capabilities in 3D spatial reasoning, such as recognizing quantitative relationships of physical objects like distances or size differences. We hyp… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  50. arXiv:2401.11767  [pdf, other

    cs.CV

    Concealed Object Segmentation with Hierarchical Coherence Modeling

    Authors: Fengyang Xiao, Pan Zhang, Chunming He, Runze Hu, Yutao Liu

    Abstract: Concealed object segmentation (COS) is a challenging task that involves localizing and segmenting those concealed objects that are visually blended with their surrounding environments. Despite achieving remarkable success, existing COS segmenters still struggle to achieve complete segmentation results in extremely concealed scenarios. In this paper, we propose a Hierarchical Coherence Modeling (HC… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted to CICAI 2023. 13 pages, 6 figures, 4 tables