Skip to main content

Showing 1–50 of 520 results for author: He, L

  1. arXiv:2407.13137  [pdf, other

    cs.CV

    OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

    Authors: Jian Sun, Yuqi Dai, Chi-Man Vong, Qing Xu, Shengbo Eben Li, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12491  [pdf, other

    cs.CV

    Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

    Authors: Yuqi Dai, Jian Sun, Shengbo Eben Li, Qing Xu, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Perception is essential for autonomous driving system. Recent approaches based on Bird's-eye-view (BEV) and deep learning have made significant progress. However, there exists challenging issues including lengthy development cycles, poor reusability, and complex sensor setups in perception algorithm development process. To tackle the above challenges, this paper proposes a novel hierarchical Bird'… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.12387  [pdf, other

    cs.CV

    HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation

    Authors: Tianpei Zou, Sanqing Qu, Zhijun Li, Alois Knoll, Lianghua He, Guang Chen, Changjun Jiang

    Abstract: 3D point cloud segmentation has received significant interest for its growing applications. However, the generalization ability of models suffers in dynamic scenarios due to the distribution shift between test and training data. To promote robustness and adaptability across diverse scenarios, test-time adaptation (TTA) has recently been introduced. Nevertheless, most existing TTA methods are devel… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Journal ref: ECCV 2024

  4. arXiv:2407.11294  [pdf, other

    cs.CV cs.GR

    COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation

    Authors: Liu He, Daniel Aliaga

    Abstract: The generation of large-scale urban layouts has garnered substantial interest across various disciplines. Prior methods have utilized procedural generation requiring manual rule coding or deep learning needing abundant data. However, prior approaches have not considered the context-sensitive nature of urban layout generation. Our approach addresses this gap by leveraging a canonical graph represen… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  5. arXiv:2407.08944  [pdf, other

    cs.CV eess.IV

    Bora: Biomedical Generalist Video Generation Model

    Authors: Weixiang Sun, Xiaocao You, Ruizhe Zheng, Zhengqing Yuan, Xiang Li, Lifang He, Quanzheng Li, Lichao Sun

    Abstract: Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for medical AI development. Diffusion models can now generate realistic images from text prompts, while recent advancements have demonstrated their ability to create diverse, high-quality videos. However, these models often struggle with generating accurate representations of medical… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  6. arXiv:2407.08103  [pdf, other

    cs.CL cs.FL

    Automata-based constraints for language model decoding

    Authors: Terry Koo, Frederick Liu, Luheng He

    Abstract: LMs are often expected to generate strings in some formal language; for example, structured data, API calls, or code snippets. Although LMs can be tuned to improve their adherence to formal syntax, this does not guarantee conformance, especially with smaller LMs suitable for large-scale deployment. In addition, tuning requires significant resources, making it impractical for uncommon or task-speci… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to CoLM 2024

  7. arXiv:2407.08092  [pdf, other

    hep-lat cs.DC math.NA

    Extending DD-$α$AMG on heterogeneous machines

    Authors: Lianhua He, Gustavo Ramirez-Hidalgo, Ke-Long Zhang

    Abstract: Multigrid solvers are the standard in modern scientific computing simulations. Domain Decomposition Aggregation-Based Algebraic Multigrid, also known as the DD-$α$AMG solver, is a successful realization of an algebraic multigrid solver for lattice quantum chromodynamics. Its CPU implementation has made it possible to construct, for some particular discretizations, simulations otherwise computation… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  8. Multi-objective Learning to Rank by Model Distillation

    Authors: Jie Tang, Huiji Gao, Liwei He, Sanjeev Katariya

    Abstract: In online marketplaces, search ranking's objective is not only to purchase or conversion (primary objective), but to also the purchase outcomes(secondary objectives), e.g. order cancellation(or return), review rating, customer service inquiries, platform long term growth. Multi-objective learning to rank has been widely studied to balance primary and secondary objectives. But traditional approache… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  9. arXiv:2407.04942  [pdf, other

    cs.RO cs.LG

    FOSP: Fine-tuning Offline Safe Policy through World Models

    Authors: Chenyang Cao, Yucheng Xin, Silang Wu, Longxiang He, Zichen Yan, Junbo Tan, Xueqian Wang

    Abstract: Model-based Reinforcement Learning (RL) has shown its high training efficiency and capability of handling high-dimensional tasks. Regarding safety issues, safe model-based RL can achieve nearly zero-cost performance and effectively manage the trade-off between performance and safety. Nevertheless, prior works still pose safety challenges due to the online exploration in real-world deployment. To a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 21 pages

  10. arXiv:2407.04711  [pdf, other

    cs.CV cs.AI eess.IV

    MetaFruit Meets Foundation Models: Leveraging a Comprehensive Multi-Fruit Dataset for Advancing Agricultural Foundation Models

    Authors: Jiajia Li, Kyle Lammers, Xunyuan Yin, Xiang Yin, Long He, Renfu Lu, Zhaojian Li

    Abstract: Fruit harvesting poses a significant labor and financial burden for the industry, highlighting the critical need for advancements in robotic harvesting solutions. Machine vision-based fruit detection has been recognized as a crucial component for robust identification of fruits to guide robotic manipulation. Despite considerable progress in leveraging deep learning and machine learning techniques… ▽ More

    Submitted 13 May, 2024; originally announced July 2024.

    Comments: 14 pages, 5 figures, 7 tables

  11. arXiv:2407.04274  [pdf, other

    cs.CV

    Fine-grained Dynamic Network for Generic Event Boundary Detection

    Authors: Ziwei Zheng, Lijun He, Le Yang, Fan Li

    Abstract: Generic event boundary detection (GEBD) aims at pinpointing event boundaries naturally perceived by humans, playing a crucial role in understanding long-form videos. Given the diverse nature of generic boundaries, spanning different video appearances, objects, and actions, this task remains challenging. Existing methods usually detect various boundaries by the same protocol, regardless of their di… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  12. arXiv:2407.02913  [pdf, other

    cs.LG cs.AI eess.IV eess.SP math.NA

    SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

    Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du

    Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  13. arXiv:2407.00711  [pdf, other

    cs.CE

    Beyond the Yield Barrier: Variational Importance Sampling Yield Analysis

    Authors: Yanfang Liu, Lei He, Wei W. Xing

    Abstract: Optimal mean shift vector (OMSV)-based importance sampling methods have long been prevalent in yield estimation and optimization as an industry standard. However, most OMSV-based methods are designed heuristically without a rigorous understanding of their limitations. To this end, we propose VIS, the first variational analysis framework for yield problems, enabling a systematic refinement for OMSV… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 2024 43rd ACM/IEEE International Conference on Computer-Aided Design (ICCAD)

    MSC Class: 68U07 ACM Class: J.6

  14. arXiv:2407.00091  [pdf, other

    cs.IR cs.HC cs.LG

    Learning to Rank for Maps at Airbnb

    Authors: Malay Haldar, Hongwei Zhang, Kedar Bellare, Sherry Chen, Soumyadip Banerjee, Xiaotang Wang, Mustafa Abdool, Huiji Gao, Pavan Tapadia, Liwei He, Sanjeev Katariya

    Abstract: As a two-sided marketplace, Airbnb brings together hosts who own listings for rent with prospective guests from around the globe. Results from a guest's search for listings are displayed primarily through two interfaces: (1) as a list of rectangular cards that contain on them the listing image, price, rating, and other details, referred to as list-results (2) as oval pins on a map showing the list… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  15. arXiv:2407.00024  [pdf, other

    cs.CV cs.AI cs.MM

    LMVD: A Large-Scale Multimodal Vlog Dataset for Depression Detection in the Wild

    Authors: Lang He, Kai Chen, Junnan Zhao, Yimeng Wang, Ercheng Pei, Haifeng Chen, Jiewei Jiang, Shiqing Zhang, Jie Zhang, Zhongmin Wang, Tao He, Prayag Tiwari

    Abstract: Depression can significantly impact many aspects of an individual's life, including their personal and social functioning, academic and work performance, and overall quality of life. Many researchers within the field of affective computing are adopting deep learning technology to explore potential patterns related to the detection of depression. However, because of subjects' privacy protection con… ▽ More

    Submitted 8 May, 2024; originally announced July 2024.

  16. arXiv:2406.18521  [pdf, other

    cs.CL cs.CV

    CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

    Authors: Zirui Wang, Mengzhou Xia, Luxi He, Howard Chen, Yitao Liu, Richard Zhu, Kaiqu Liang, Xindi Wu, Haotian Liu, Sadhika Malladi, Alexis Chevalier, Sanjeev Arora, Danqi Chen

    Abstract: Chart understanding plays a pivotal role when applying Multimodal Large Language Models (MLLMs) to real-world tasks such as analyzing scientific papers or financial reports. However, existing datasets often focus on oversimplified and homogeneous charts with template-based questions, leading to an over-optimistic measure of progress. We demonstrate that although open-source models can appear to ou… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 121 pages, 90 figures

  17. arXiv:2406.18051  [pdf, other

    cs.CV

    ViT-1.58b: Mobile Vision Transformers in the 1-bit Era

    Authors: Zhengqing Yuan, Rong Zhou, Hongyi Wang, Lifang He, Yanfang Ye, Lichao Sun

    Abstract: Vision Transformers (ViTs) have achieved remarkable performance in various image classification tasks by leveraging the attention mechanism to process image patches as tokens. However, the high computational and memory demands of ViTs pose significant challenges for deployment in resource-constrained environments. This paper introduces ViT-1.58b, a novel 1.58-bit quantized ViT model designed to dr… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  18. arXiv:2406.16536  [pdf, other

    cs.CL

    C-LLM: Learn to Check Chinese Spelling Errors Character by Character

    Authors: Kunting Li, Yong Hu, Liang He, Fandong Meng, Jie Zhou

    Abstract: Chinese Spell Checking (CSC) aims to detect and correct spelling errors in sentences. Despite Large Language Models (LLMs) exhibit robust capabilities and are widely applied in various tasks, their performance on CSC is often unsatisfactory. We find that LLMs fail to meet the Chinese character-level constraints of the CSC task, namely equal length and phonetic similarity, leading to a performance… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  19. arXiv:2406.14598  [pdf, other

    cs.AI

    SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

    Authors: Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal

    Abstract: Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  20. arXiv:2406.14526  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Fantastic Copyrighted Beasts and How (Not) to Generate Them

    Authors: Luxi He, Yangsibo Huang, Weijia Shi, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson

    Abstract: Recent studies show that image and video generation models can be prompted to reproduce copyrighted content from their training data, raising serious legal concerns around copyright infringement. Copyrighted characters, in particular, pose a difficult challenge for image generation services, with at least one lawsuit already awarding damages based on the generation of these characters. Yet, little… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  21. arXiv:2406.12548  [pdf, other

    cs.CL

    P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts

    Authors: Yuhao Dan, Jie Zhou, Qin Chen, Junfeng Tian, Liang He

    Abstract: Personalized large language models (LLMs) have attracted great attention in many applications, such as intelligent education and emotional support. Most work focuses on controlling the character settings based on the profile (e.g., age, skill, experience, and so on). Conversely, the psychological theory-based personality traits with implicit expression and behavior are not well modeled, limiting t… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  22. arXiv:2406.11441  [pdf, other

    cs.CV

    SWCF-Net: Similarity-weighted Convolution and Local-global Fusion for Efficient Large-scale Point Cloud Semantic Segmentation

    Authors: Zhenchao Lin, Li He, Hongqiang Yang, Xiaoqun Sun, Cuojin Zhang, Weinan Chen, Yisheng Guan, Hong Zhang

    Abstract: Large-scale point cloud consists of a multitude of individual objects, thereby encompassing rich structural and underlying semantic contextual information, resulting in a challenging problem in efficiently segmenting a point cloud. Most existing researches mainly focus on capturing intricate local features without giving due consideration to global ones, thus failing to leverage semantic context.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  23. arXiv:2406.09095  [pdf, other

    cs.CL

    Modeling Comparative Logical Relation with Contrastive Learning for Text Generation

    Authors: Yuhao Dan, Junfeng Tian, Jie Zhou, Ming Yan, Ji Zhang, Qin Chen, Liang He

    Abstract: Data-to-Text Generation (D2T), a classic natural language generation problem, aims at producing fluent descriptions for structured input data, such as a table. Existing D2T works mainly focus on describing the superficial associative relations among entities, while ignoring the deep comparative logical relations, such as A is better than B in a certain aspect with a corresponding opinion, which is… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  24. arXiv:2406.07147  [pdf

    cs.HC cs.AI cs.CY

    Wearable Device-Based Real-Time Monitoring of Physiological Signals: Evaluating Cognitive Load Across Different Tasks

    Authors: Ling He, Yanxin Chen, Wenqi Wang, Shuting He, Xiaoqiang Hu

    Abstract: This study employs cutting-edge wearable monitoring technology to conduct high-precision, high-temporal-resolution (1-second interval) cognitive load assessment on electroencephalogram (EEG) data from the FP1 channel and heart rate variability (HRV) data of secondary vocational students. By jointly analyzing these two critical physiological indicators, the research delves into their application va… ▽ More

    Submitted 3 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  25. arXiv:2406.05540  [pdf, other

    q-bio.QM cs.AI cs.CL cs.LG

    A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding

    Authors: Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, Yu Guang Wang

    Abstract: The parallels between protein sequences and natural language in their sequential structures have inspired the application of large language models (LLMs) to protein understanding. Despite the success of LLMs in NLP, their effectiveness in comprehending protein sequences remains an open question, largely due to the absence of datasets linking protein sequences to descriptive text. Researchers have… ▽ More

    Submitted 8 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  26. arXiv:2406.00976  [pdf, other

    cs.CL cs.SD eess.AS

    Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

    Authors: Yongxin Zhu, Dan Su, Liqiang He, Linli Xu, Dong Yu

    Abstract: While recent advancements in speech language models have achieved significant progress, they face remarkable challenges in modeling the long acoustic sequences of neural audio codecs. In this paper, we introduce \textbf{G}enerative \textbf{P}re-trained \textbf{S}peech \textbf{T}ransformer (GPST), a hierarchical transformer designed for efficient speech language modeling. GPST quantizes audio wavef… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accept in ACL2024-main

  27. arXiv:2406.00037  [pdf, other

    cs.CL cs.AI

    Aligning LLMs through Multi-perspective User Preference Ranking-based Feedback for Programming Question Answering

    Authors: Hongyu Yang, Liyang He, Min Hou, Shuanghong Shen, Rui Li, Jiahui Hou, Jianhui Ma, Junda Zhao

    Abstract: Code Community Question Answering (CCQA) seeks to tackle programming-related issues, thereby boosting productivity in both software engineering and academic research. Recent advancements in Reinforcement Learning from Human Feedback (RLHF) have transformed the fine-tuning process of Large Language Models (LLMs) to produce responses that closely mimic human behavior. Leveraging LLMs with RLHF for p… ▽ More

    Submitted 27 May, 2024; originally announced June 2024.

  28. arXiv:2405.20614  [pdf, other

    cs.CV

    EPIDetect: Video-based convulsive seizure detection in chronic epilepsy mouse model for anti-epilepsy drug screening

    Authors: Junming Ren, Zhoujian Xiao, Yujia Zhang, Yujie Yang, Ling He, Ezra Yoon, Stephen Temitayo Bello, Xi Chen, Dapeng Wu, Micky Tortorella, Jufang He

    Abstract: In the preclinical translational studies, drug candidates with remarkable anti-epileptic efficacy demonstrate long-term suppression of spontaneous recurrent seizures (SRSs), particularly convulsive seizures (CSs), in mouse models of chronic epilepsy. However, the current methods for monitoring CSs have limitations in terms of invasiveness, specific laboratory settings, high cost, and complex opera… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  29. arXiv:2405.19524  [pdf, other

    cs.CR cs.AI

    AI Risk Management Should Incorporate Both Safety and Security

    Authors: Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

    Abstract: The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  30. arXiv:2405.18653  [pdf, other

    cs.CL

    Recent Advances of Foundation Language Models-based Continual Learning: A Survey

    Authors: Yutao Yang, Jie Zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Liang He, Yuan Xie

    Abstract: Recently, foundation language models (LMs) have marked significant achievements in the domains of natural language processing (NLP) and computer vision (CV). Unlike traditional neural network models, foundation LMs obtain a great ability for transfer learning by acquiring rich commonsense knowledge through pre-training on extensive unsupervised datasets with a vast number of parameters. However, t… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  31. arXiv:2405.18187  [pdf, other

    cs.LG

    AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

    Authors: Longxiang He, Li Shen, Junbo Tan, Xueqian Wang

    Abstract: Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets IQL as an actor-critic method and gets weights of implicit pol… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 19 pages, 3 figures, 4 tables

  32. arXiv:2405.15324  [pdf, other

    cs.RO cs.AI cs.CV

    Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

    Authors: Jianbiao Mei, Yukai Ma, Xuemeng Yang, Licheng Wen, Xinyu Cai, Xin Li, Daocheng Fu, Bo Zhang, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yong Liu, Yu Qiao

    Abstract: Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretability in varied environments. To address the above problems, we introduce LeapAD, a novel paradigm for autonomous driving inspired by the human cognitiv… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 23 pages, 16 figures

  33. arXiv:2405.14334  [pdf, other

    cs.CV

    Hierarchical Salient Patch Identification for Interpretable Fundus Disease Localization

    Authors: Yitao Peng, Lianghua He, Die Hu

    Abstract: With the widespread application of deep learning technology in medical image analysis, how to effectively explain model decisions and improve diagnosis accuracy has become an urgent problem that needs to be solved. Attribution methods have become a key tool to help doctors better understand the diagnostic basis of models, and they are used to explain and localize diseases in medical images. Howeve… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  34. arXiv:2405.13190  [pdf, other

    cs.LG cs.AI

    Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation

    Authors: Haoteng Tang, Guodong Liu, Siyuan Dai, Kai Ye, Kun Zhao, Wenlu Wang, Carl Yang, Lifang He, Alex Leow, Paul Thompson, Heng Huang, Liang Zhan

    Abstract: The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal fun… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  35. arXiv:2405.12100  [pdf, other

    cs.CL

    DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction

    Authors: Hao Chen, Biaojie Zeng, Xin Lin, Liang He, Aimin Zhou

    Abstract: Math world problems correction(MWPC) is a novel task dedicated to rectifying reasoning errors in the process of solving mathematical problems. In this paper, leveraging the advancements in large language models (LLMs), we address two key objectives:(1) Distinguishing between mathematical reasoning and error correction; (2) Exploring strategies to enhance the error correction capabilities of LLMs i… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  36. arXiv:2405.11459  [pdf, other

    eess.SP cs.CL q-bio.NC

    Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals

    Authors: Hui Zheng, Hai-Teng Wang, Wei-Bang Jiang, Zhong-Tao Chen, Li He, Pei-Yang Lin, Peng-Hu Wei, Guo-Guang Zhao, Yun-Zhe Liu

    Abstract: Invasive brain-computer interfaces have garnered significant attention due to their high performance. The current intracranial stereoElectroEncephaloGraphy (sEEG) foundation models typically build univariate representations based on a single channel. Some of them further use Transformer to model the relationship among channels. However, due to the locality and specificity of brain computation, the… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  37. arXiv:2405.09215  [pdf, other

    cs.CV cs.AI

    Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

    Authors: Wanting Xu, Yang Liu, Langping He, Xucheng Huang, Ling Jiang

    Abstract: We introduce Xmodel-VLM, a cutting-edge multimodal vision language model. It is designed for efficient deployment on consumer GPU servers. Our work directly confronts a pivotal industry issue by grappling with the prohibitive service costs that hinder the broad adoption of large-scale multimodal systems. Through rigorous training, we have developed a 1B-scale language model from the ground up, emp… ▽ More

    Submitted 20 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  38. arXiv:2405.09055  [pdf, other

    cs.CL

    A safety realignment framework via subspace-oriented model fusion for large language models

    Authors: Xin Yi, Shunfan Zheng, Linlin Wang, Xiaoling Wang, Liang He

    Abstract: The current safeguard mechanisms for large language models (LLMs) are indeed susceptible to jailbreak attacks, making them inherently fragile. Even the process of fine-tuning on apparently benign data for downstream tasks can jeopardize safety. One potential solution is to conduct safety fine-tuning subsequent to downstream fine-tuning. However, there's a risk of catastrophic forgetting during saf… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  39. arXiv:2405.09045  [pdf, other

    cs.CV

    AMSNet: Netlist Dataset for AMS Circuits

    Authors: Zhuofu Tao, Yichen Shi, Yiru Huo, Rui Ye, Zonghang Li, Li Huang, Chen Wu, Na Bai, Zhiping Yu, Ting-Jung Lin, Lei He

    Abstract: Today's analog/mixed-signal (AMS) integrated circuit (IC) designs demand substantial manual intervention. The advent of multimodal large language models (MLLMs) has unveiled significant potential across various fields, suggesting their applicability in streamlining large-scale AMS IC design as well. A bottleneck in employing MLLMs for automatic AMS circuit generation is the absence of a comprehens… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  40. arXiv:2405.05722  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.chem-ph

    A Framework of SO(3)-equivariant Non-linear Representation Learning and its Application to Electronic-Structure Hamiltonian Prediction

    Authors: Shi Yin, Xinyang Pan, Fengyan Wang, Feng Wu, Lixin He

    Abstract: We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities. Inspired by covariant theory in physics, we address this problem by exploring the mathematical relationships between SO(3)-invariant and… ▽ More

    Submitted 18 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  41. arXiv:2405.05496  [pdf, other

    cs.CL

    Boosting Large Language Models with Continual Learning for Aspect-based Sentiment Analysis

    Authors: Xuanwen Ding, Jie Zhou, Liang Dou, Qin Chen, Yuanbin Wu, Chengcai Chen, Liang He

    Abstract: Aspect-based sentiment analysis (ABSA) is an important subtask of sentiment analysis, which aims to extract the aspects and predict their sentiments. Most existing studies focus on improving the performance of the target domain by fine-tuning domain-specific models (trained on source domains) based on the target domain dataset. Few works propose continual learning tasks for ABSA, which aim to lear… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  42. arXiv:2405.05131  [pdf, other

    cs.RO

    DenserRadar: A 4D millimeter-wave radar point cloud detector based on dense LiDAR point clouds

    Authors: Zeyu Han, Junkai Jiang, Xiaokang Ding, Qingwen Meng, Shaobing Xu, Lei He, Jianqiang Wang

    Abstract: The 4D millimeter-wave (mmWave) radar, with its robustness in extreme environments, extensive detection range, and capabilities for measuring velocity and elevation, has demonstrated significant potential for enhancing the perception abilities of autonomous driving systems in corner-case scenarios. Nevertheless, the inherent sparsity and noise of 4D mmWave radar point clouds restrict its further d… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  43. arXiv:2405.04041  [pdf, other

    cs.AI cs.CV

    Feature Map Convergence Evaluation for Functional Module

    Authors: Ludan Zhang, Chaoyi Chen, Lei He, Keqiang Li

    Abstract: Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding. However, perception models are predominantly optimized as a black box through end-to-end training, lacking independent evaluation of functional modules, which poses difficulties for interpretability and optimization. Pioneer… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  44. arXiv:2405.03098  [pdf, other

    cs.CL

    FairMonitor: A Dual-framework for Detecting Stereotypes and Biases in Large Language Models

    Authors: Yanhong Bai, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xingjiao Wu, Liang He

    Abstract: Detecting stereotypes and biases in Large Language Models (LLMs) is crucial for enhancing fairness and reducing adverse impacts on individuals or groups when these models are applied. Traditional methods, which rely on embedding spaces or are based on probability metrics, fall short in revealing the nuanced and implicit biases present in various contexts. To address this challenge, we propose the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  45. arXiv:2405.00077  [pdf, other

    cs.LG eess.SP

    BrainODE: Dynamic Brain Signal Analysis via Graph-Aided Neural Ordinary Differential Equations

    Authors: Kaiqiao Han, Yi Yang, Zijie Huang, Xuan Kan, Yang Yang, Ying Guo, Lifang He, Liang Zhan, Yizhou Sun, Wei Wang, Carl Yang

    Abstract: Brain network analysis is vital for understanding the neural interactions regarding brain structures and functions, and identifying potential biomarkers for clinical phenotypes. However, widely used brain signals such as Blood Oxygen Level Dependent (BOLD) time series generated from functional Magnetic Resonance Imaging (fMRI) often manifest three challenges: (1) missing values, (2) irregular samp… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  46. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  47. arXiv:2404.13816  [pdf, other

    cs.CV

    Neural Radiance Field in Autonomous Driving: A Survey

    Authors: Lei He, Leheng Li, Wenchao Sun, Zeyu Han, Yichen Liu, Sifa Zheng, Jianqiang Wang, Keqiang Li

    Abstract: Neural Radiance Field (NeRF) has garnered significant attention from both academia and industry due to its intrinsic advantages, particularly its implicit representation and novel view synthesis capabilities. With the rapid advancements in deep learning, a multitude of methods have emerged to explore the potential applications of NeRF in the domain of Autonomous Driving (AD). However, a conspicuou… ▽ More

    Submitted 26 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  48. arXiv:2404.11326  [pdf, other

    cs.CV

    Single-temporal Supervised Remote Change Detection for Domain Generalization

    Authors: Qiangang Du, Jinlong Peng, Xu Chen, Qingdong He, Liren He, Qiang Nie, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: Change detection is widely applied in remote sensing image analysis. Existing methods require training models separately for each dataset, which leads to poor domain generalization. Moreover, these methods rely heavily on large amounts of high-quality pair-labelled data for training, which is expensive and impractical. In this paper, we propose a multimodal contrastive learning (ChangeCLIP) based… ▽ More

    Submitted 23 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  49. arXiv:2404.09699  [pdf, other

    cs.GT

    Generative AI for Game Theory-based Mobile Networking

    Authors: Long He, Geng Sun, Dusit Niyato, Hongyang Du, Fang Mei, Jiawen Kang, Mérouane Debbah, and Zhu Han

    Abstract: With the continuous advancement of network technology, various emerging complex networking optimization problems opened up a wide range of applications utilizating of game theory. However, since game theory is a mathematical framework, game theory-based solutions often require the experience and knowledge of human experts. Recently, the remarkable advantages exhibited by generative artificial inte… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  50. arXiv:2404.09509  [pdf, other

    cs.CV

    Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder

    Authors: Chong Peng, Liqiang He, Dan Su

    Abstract: Today, there have been many achievements in learning the association between voice and face. However, most previous work models rely on cosine similarity or L2 distance to evaluate the likeness of voices and faces following contrastive learning, subsequently applied to retrieval and matching tasks. This method only considers the embeddings as high-dimensional vectors, utilizing a minimal scope of… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.