Skip to main content

Showing 1–50 of 3,883 results for author: Yang, Y

  1. arXiv:2407.13647  [pdf, other

    cs.CL cs.AI

    Weak-to-Strong Reasoning

    Authors: Yuqing Yang, Yan Ma, Pengfei Liu

    Abstract: When large language models (LLMs) exceed human-level capabilities, it becomes increasingly challenging to provide full-scale and accurate supervisions for these models. Weak-to-strong learning, which leverages a less capable model to unlock the latent abilities of a stronger model, proves valuable in this context. Yet, the efficacy of this approach for complex reasoning tasks is still untested. Fu… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2407.13495  [pdf

    cs.DL stat.ME stat.OT

    Identifying Research Hotspots and Future Development Trends in Current Psychology: A Bibliometric Analysis of the Past Decade's Publications

    Authors: Shen Liu, Yan Yang

    Abstract: By conducting a bibliometric analysis on 4,869 publications in Current Psychology from 2013 to 2022, this paper examined the annual publications and annual citations, as well as the leading institutions, countries, and keywords. CiteSpace, VOSviewer and SCImago Graphica were utilized for visualization analysis. On one hand, this paper analyzed the academic influence of Current Psychology over the… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.12996  [pdf, other

    stat.ML cs.LG

    Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance

    Authors: Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang

    Abstract: Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  4. arXiv:2407.12863  [pdf, other

    cs.CL cs.AI

    Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models

    Authors: Jung Hyun Lee, June Yong Yang, Byeongho Heo, Dongyoon Han, Kang Min Yoo

    Abstract: Large Language Models (LLMs) have demonstrated impressive problem-solving capabilities in mathematics through step-by-step reasoning chains. However, they are susceptible to reasoning errors that impact the quality of subsequent reasoning chains and the final answer due to language models' autoregressive token-by-token generating nature. Recent works have proposed adopting external verifiers to gu… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  5. arXiv:2407.12842  [pdf, other

    cs.CL cs.AI

    MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production

    Authors: Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

    Abstract: Sign language understanding has made significant strides; however, there is still no viable solution for generating sign sequences directly from entire spoken content, e.g., text or speech. In this paper, we propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users. In particular, a sequence diffusion model, utilizing embeddi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Findings; Project Page: https://hechang25.github.io/MS2SL

  6. arXiv:2407.12827  [pdf, other

    cs.CL cs.LG

    The Solution for The PST-KDD-2024 OAG-Challenge

    Authors: Shupeng Zhong, Xinger Li, Shushan Jin, Yang Yang

    Abstract: In this paper, we introduce the second-place solution in the KDD-2024 OAG-Challenge paper source tracing track. Our solution is mainly based on two methods, BERT and GCN, and combines the reasoning results of BERT and GCN in the final submission to achieve complementary performance. In the BERT solution, we focus on processing the fragments that appear in the references of the paper, and use a var… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  7. arXiv:2407.12667  [pdf, other

    cs.CV

    SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization

    Authors: Yiyang Chen, Siyan Dong, Xulong Wang, Lulu Cai, Youyi Zheng, Yanchao Yang

    Abstract: 3D surface reconstruction from images is essential for numerous applications. Recently, Neural Radiance Fields (NeRFs) have emerged as a promising framework for 3D modeling. However, NeRFs require accurate camera poses as input, and existing methods struggle to handle significantly noisy pose estimates (i.e., outliers), which are commonly encountered in real-world scenarios. To tackle this challen… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  8. arXiv:2407.12661  [pdf, other

    cs.CV

    InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction

    Authors: Xulong Wang, Siyan Dong, Youyi Zheng, Yanchao Yang

    Abstract: 3D surface reconstruction from multi-view images is essential for scene understanding and interaction. However, complex indoor scenes pose challenges such as ambiguity due to limited observations. Recent implicit surface representations, such as Neural Radiance Fields (NeRFs) and signed distance functions (SDFs), employ various geometric priors to resolve the lack of observed information. Neverthe… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  9. arXiv:2407.12232  [pdf, other

    cs.AR

    RTL Verification for Secure Speculation Using Contract Shadow Logic

    Authors: Qinhan Tan, Yuheng Yang, Thomas Bourgeat, Sharad Malik, Mengjia Yan

    Abstract: Modern out-of-order processors face speculative execution attacks. Despite various proposed software and hardware mitigations to prevent such attacks, new attacks keep arising from unknown vulnerabilities. Thus, a formal and rigorous evaluation of the ability of hardware designs to deal with speculative execution attacks is urgently desired. This paper proposes a formal verification technique call… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted to ASPLOS 2025

  10. arXiv:2407.11556  [pdf, other

    cs.DB

    LITS: An Optimized Learned Index for Strings (An Extended Version)

    Authors: Yifan Yang, Shimin Chen

    Abstract: Index is an important component in database systems. Learned indexes have been shown to outperform traditional tree-based index structures for fixed-sized integer or floating point keys. However, the application of the learned solution to variable-length string keys is under-researched. Our experiments show that existing learned indexes for strings fail to outperform traditional string indexes, su… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  11. arXiv:2407.11534  [pdf, other

    cs.LG cs.AI

    LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

    Authors: Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee

    Abstract: With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization (PTQ) techniques for quantizing weights and activations of LLMs still suffer from non-negligible accuracy drops, especially on massive multitask language underst… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Preprint

  12. arXiv:2407.11466  [pdf, other

    cs.CY

    Navigating the Data Trading Crossroads: An Interdisciplinary Survey

    Authors: Yi Yu, Jingru Yu, Xuhong Wang, Juanjuan Li, Yilun Lin, Conghui He, Yanqing Yang, Yu Qiao, Li Li, Fei-Yue Wang

    Abstract: Data has been increasingly recognized as a critical factor in the future economy. However, constructing an efficient data trading market faces challenges such as privacy breaches, data monopolies, and misuse. Despite numerous studies proposing algorithms to protect privacy and methods for pricing data, a comprehensive understanding of these issues and systemic solutions remain elusive. This paper… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  13. arXiv:2407.11353  [pdf, other

    stat.ML cs.LG math.ST

    Preconditioned Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression

    Authors: Yingzhen Yang

    Abstract: We consider nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) or its variant in this paper. We show that, if the neural network is trained with a novel Preconditioned Gradient Descent (PGD) with early stopping and the target function has spectral bias widely studied in the deep learning literature, the trained network renders a particularly… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  14. arXiv:2407.11280  [pdf, other

    cs.AI cs.CE cs.DB cs.LG

    Intelligent Cross-Organizational Process Mining: A Survey and New Perspectives

    Authors: Yiyuan Yang, Zheshun Wu, Yong Chu, Zhenghua Chen, Zenglin Xu, Qingsong Wen

    Abstract: Process mining, as a high-level field in data mining, plays a crucial role in enhancing operational efficiency and decision-making across organizations. In this survey paper, we delve into the growing significance and ongoing trends in the field of process mining, advocating a specific viewpoint on its contents, application, and development in modern businesses and process management, particularly… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review; 13 pages, 7 figures, 2 tables

  15. arXiv:2407.10784  [pdf, other

    cs.LG cs.AI stat.ML

    AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler

    Authors: Changhun Kim, Taewon Kim, Seungyeon Woo, June Yong Yang, Eunho Yang

    Abstract: In real-world applications, tabular data often suffer from distribution shifts due to their widespread and abundant nature, leading to erroneous predictions of pre-trained machine learning models. However, addressing such distribution shifts in the tabular domain has been relatively underexplored due to unique challenges such as varying attributes and dataset sizes, as well as the limited represen… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  16. arXiv:2407.10707  [pdf, other

    cs.CV

    Interactive Rendering of Relightable and Animatable Gaussian Avatars

    Authors: Youyi Zhan, Tianjia Shao, He Wang, Yin Yang, Kun Zhou

    Abstract: Creating relightable and animatable avatars from multi-view or monocular videos is a challenging task for digital human creation and virtual reality applications. Previous methods rely on neural radiance fields or ray tracing, resulting in slow training and rendering processes. By utilizing Gaussian Splatting, we propose a simple and efficient method to decouple body materials and lighting from sp… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  17. arXiv:2407.10704  [pdf, other

    cs.CV

    Quantized Prompt for Efficient Generalization of Vision-Language Models

    Authors: Tianxiang Hao, Xiaohan Ding, Juexiao Feng, Yuhong Yang, Hui Chen, Guiguang Ding

    Abstract: In the past few years, large-scale pre-trained vision-language models like CLIP have achieved tremendous success in various fields. Naturally, how to transfer the rich knowledge in such huge pre-trained models to downstream tasks and datasets becomes a hot topic. During downstream adaptation, the most challenging problems are overfitting and catastrophic forgetting, which can cause the model to ov… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 14 pages, 7 figures. Accepted by ECCV 2024

  18. arXiv:2407.10646  [pdf, other

    cs.SD eess.AS

    Towards zero-shot amplifier modeling: One-to-many amplifier modeling via tone embedding control

    Authors: Yu-Hua Chen, Yen-Tung Yeh, Yuan-Chiao Cheng, Jui-Te Wu, Yu-Hsiang Ho, Jyh-Shing Roger Jang, Yi-Hsuan Yang

    Abstract: Replicating analog device circuits through neural audio effect modeling has garnered increasing interest in recent years. Existing work has predominantly focused on a one-to-one emulation strategy, modeling specific devices individually. In this paper, we tackle the less-explored scenario of one-to-many emulation, utilizing conditioning mechanisms to emulate multiple guitar amplifiers through a si… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ISMIR 2024

  19. arXiv:2407.10495  [pdf, other

    cs.LG cs.CV

    Improving Hyperbolic Representations via Gromov-Wasserstein Regularization

    Authors: Yifei Yang, Wonjun Lee, Dongmian Zou, Gilad Lerman

    Abstract: Hyperbolic representations have shown remarkable efficacy in modeling inherent hierarchies and complexities within data structures. Hyperbolic neural networks have been commonly applied for learning such representations from data, but they often fall short in preserving the geometric structures of the original feature spaces. In response to this challenge, our work applies the Gromov-Wasserstein (… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted for ECCV 2024

  20. arXiv:2407.10459  [pdf, other

    cs.CV

    DiffStega: Towards Universal Training-Free Coverless Image Steganography with Diffusion Models

    Authors: Yiwei Yang, Zheyuan Liu, Jun Jia, Zhongpai Gao, Yunhao Li, Wei Sun, Xiaohong Liu, Guangtao Zhai

    Abstract: Traditional image steganography focuses on concealing one image within another, aiming to avoid steganalysis by unauthorized entities. Coverless image steganography (CIS) enhances imperceptibility by not using any cover image. Recent works have utilized text prompts as keys in CIS through diffusion models. However, this approach faces three challenges: invalidated when private prompt is guessed, c… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures; reference added; accepted at IJCAI2024 main track

  21. arXiv:2407.10406  [pdf, other

    cs.CV

    Towards Scale-Aware Full Surround Monodepth with Transformers

    Authors: Yuchen Yang, Xinyi Wang, Dong Li, Lu Tian, Ashish Sirasao, Xun Yang

    Abstract: Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously in a self-supervised manner to predict the scale-aware depth, which is more practical for real-world applications in contrast to scale-ambiguous depth from a standalone monocular camera. In this work, we focus on enhancing the scale-awareness of FSM methods for depth estimation. To this end, we propose to imp… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  22. arXiv:2407.10373  [pdf, other

    cs.SD cs.AI cs.CV eess.AS

    Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

    Authors: Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

    Abstract: Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired da… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://hechang25.github.io/MVSD

  23. arXiv:2407.10299  [pdf, other

    cs.CV

    Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

    Authors: Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, Shao-Yuan Lo

    Abstract: Video Anomaly Detection (VAD) is crucial for applications such as security surveillance and autonomous driving. However, existing VAD methods provide little rationale behind detection, hindering public trust in real-world deployments. In this paper, we approach VAD with a reasoning framework. Although Large Language Models (LLMs) have shown revolutionary reasoning ability, we find that their direc… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  24. arXiv:2407.10200  [pdf, other

    cs.CV cs.AI

    Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data

    Authors: Tuo Feng, Wenguan Wang, Ruijie Quan, Yi Yang

    Abstract: Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/FengZicai/S2S

  25. arXiv:2407.10157  [pdf, other

    eess.IV cs.CV

    SACNet: A Spatially Adaptive Convolution Network for 2D Multi-organ Medical Segmentation

    Authors: Lin Zhang, Wenbo Gao, Jie Yi, Yunyun Yang

    Abstract: Multi-organ segmentation in medical image analysis is crucial for diagnosis and treatment planning. However, many factors complicate the task, including variability in different target categories and interference from complex backgrounds. In this paper, we utilize the knowledge of Deformable Convolution V3 (DCNv3) and multi-object segmentation to optimize our Spatially Adaptive Convolution Network… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  26. arXiv:2407.10040  [pdf, other

    cs.AI

    Lean-STaR: Learning to Interleave Thinking and Proving

    Authors: Haohan Lin, Zhiqing Sun, Yiming Yang, Sean Welleck

    Abstract: Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal proofs can be useful for learning to prove theorems. For instance, humans think through steps of a proof, but this thought process is not visible in the… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  27. arXiv:2407.09822  [pdf, other

    cs.CV

    VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation

    Authors: Wenjie Zhuo, Fan Ma, Hehe Fan, Yi Yang

    Abstract: This paper presents Invariant Score Distillation (ISD), a novel method for high-fidelity text-to-3D generation. ISD aims to tackle the over-saturation and over-smoothing problems in Score Distillation Sampling (SDS). In this paper, SDS is decoupled into a weighted sum of two components: the reconstruction term and the classifier-free guidance term. We experimentally found that over-saturation stem… ▽ More

    Submitted 17 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  28. arXiv:2407.09295  [pdf, other

    cs.CR

    Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study

    Authors: Yulong Yang, Xinshan Yang, Shuaidong Li, Chenhao Lin, Zhengyu Zhao, Chao Shen, Tianwei Zhang

    Abstract: The rapid progress in the reasoning capability of the Multi-modal Large Language Models (MLLMs) has triggered the development of autonomous agent systems on mobile devices. MLLM-based mobile agent systems consist of perception, reasoning, memory, and multi-agent collaboration modules, enabling automatic analysis of user instructions and the design of task pipelines with only natural language and d… ▽ More

    Submitted 17 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: Preprint. Work in progress

  29. arXiv:2407.09164  [pdf, other

    cs.CR cs.AI

    TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

    Authors: Yuchen Yang, Hongwei Yao, Bingrun Yang, Yiling He, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren

    Abstract: Recently, code-oriented large language models (Code LLMs) have been widely and successfully used to simplify and facilitate code programming. With these tools, developers can easily generate desired complete functional codes based on incomplete code and natural language prompts. However, a few pioneering works revealed that these Code LLMs are also vulnerable, e.g., against backdoor and adversaria… ▽ More

    Submitted 15 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  30. arXiv:2407.09157  [pdf, other

    cs.IR cs.AI cs.LG

    Movie Recommendation with Poster Attention via Multi-modal Transformer Feature Fusion

    Authors: Linhan Xia, Yicheng Yang, Ziou Chen, Zheng Yang, Shengxin Zhu

    Abstract: Pre-trained models learn general representations from large datsets which can be fine-turned for specific tasks to significantly reduce training time. Pre-trained models like generative pretrained transformers (GPT), bidirectional encoder representations from transformers (BERT), vision transfomers (ViT) have become a cornerstone of current research in machine learning. This study proposes a multi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  31. arXiv:2407.09088  [pdf, other

    eess.IV cs.AI cs.CV

    FD-SOS: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images

    Authors: Marawan Elbatel, Keyuan Liu, Yanqi Yang, Xiaomeng Li

    Abstract: Accurate detection of bone fenestration and dehiscence (FD) is crucial for effective treatment planning in dentistry. While cone-beam computed tomography (CBCT) is the gold standard for evaluating FD, it comes with limitations such as radiation exposure, limited accessibility, and higher cost compared to intraoral images. In intraoral images, dentists face challenges in the differential diagnosis… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  32. arXiv:2407.08935  [pdf, other

    cs.CR

    Distributed Backdoor Attacks on Federated Graph Learning and Certified Defenses

    Authors: Yuxin Yang, Qiang Li, Jinyuan Jia, Yuan Hong, Binghui Wang

    Abstract: Federated graph learning (FedGL) is an emerging federated learning (FL) framework that extends FL to learn graph data from diverse sources. FL for non-graph data has shown to be vulnerable to backdoor attacks, which inject a shared backdoor trigger into the training data such that the trained backdoored FL model can predict the testing data containing the trigger as the attacker desires. However,… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper is accepted to CCS2024

  33. Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

    Authors: Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu

    Abstract: The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task e… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, accepted by GECCO 2024 poster

  34. Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification

    Authors: Wenshuo Peng, Kaipeng Zhang, Yue Yang, Hao Zhang, Yu Qiao

    Abstract: Vision-language foundation models have been incredibly successful in a wide range of downstream computer vision tasks using adaptation methods. However, due to the high cost of obtaining pre-training datasets, pairs with weak image-text correlation in the data exist in large numbers. We call them weak-paired samples. Due to the limitations of these weak-paired samples, the pre-training model are u… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages,4 figures

  35. arXiv:2407.08231  [pdf, other

    cs.CV

    E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors

    Authors: Jinxiu Liang, Bohan Yu, Yixin Yang, Yiming Han, Boxin Shi

    Abstract: Event cameras, mimicking the human retina, capture brightness changes with unparalleled temporal resolution and dynamic range. Integrating events into intensities poses a highly ill-posed challenge, marred by initial condition ambiguities. Traditional regression-based deep learning methods fall short in perceptual quality, offering deterministic and often unrealistic reconstructions. In this paper… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  36. Chromosomal Structural Abnormality Diagnosis by Homologous Similarity

    Authors: Juren Li, Fanzhe Fu, Ran Wei, Yifei Sun, Zeyu Lai, Ning Song, Xin Chen, Yang Yang

    Abstract: Pathogenic chromosome abnormalities are very common among the general population. While numerical chromosome abnormalities can be quickly and precisely detected, structural chromosome abnormalities are far more complex and typically require considerable efforts by human experts for identification. This paper focuses on investigating the modeling of chromosome features and the identification of chr… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  37. arXiv:2407.08133  [pdf, other

    cs.CV cs.AI

    Nonverbal Interaction Detection

    Authors: Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang

    Abstract: This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the l… ▽ More

    Submitted 14 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/weijianan1/NVI

  38. arXiv:2407.08132  [pdf, other

    cs.CV

    DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing

    Authors: Minghang Zhou, Tianyu Li, Chaofan Qiao, Dongyu Xie, Guoqing Wang, Ningjuan Ruan, Lin Mei, Yang Yang

    Abstract: Multispectral oriented object detection faces challenges due to both inter-modal and intra-modal discrepancies. Recent studies often rely on transformer-based models to address these issues and achieve cross-modal fusion detection. However, the quadratic computational complexity of transformers limits their performance. Inspired by the efficiency and lower complexity of Mamba in long sequence task… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 12 pages, 9 figures

  39. arXiv:2407.07931  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Search, Examine and Early-Termination: Fake News Detection with Annotation-Free Evidences

    Authors: Yuzhou Yang, Yangming Zhou, Qichao Ying, Zhenxing Qian, Xinpeng Zhang

    Abstract: Pioneer researches recognize evidences as crucial elements in fake news detection apart from patterns. Existing evidence-aware methods either require laborious pre-processing procedures to assure relevant and high-quality evidence data, or incorporate the entire spectrum of available evidences in all news cases, regardless of the quality and quantity of the retrieved data. In this paper, we propos… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECAI 2024 paper. Fudan University & NVIDIA. To appear

  40. arXiv:2407.07577  [pdf, other

    cs.CV cs.AI

    IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

    Authors: Yatai Ji, Shilong Zhang, Jie Wu, Peize Sun, Weifeng Chen, Xuefeng Xiao, Sidi Yang, Yujiu Yang, Ping Luo

    Abstract: The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across different scenes has not yet been explored, which is essential for understanding complex visual content, such as movies with multiple characters and i… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  41. arXiv:2407.07433  [pdf, other

    cs.CV cs.AI

    Controllable Navigation Instruction Generation with Chain of Thought Prompting

    Authors: Xianghao Kong, Jinyu Chen, Wenguan Wang, Hang Su, Xiaolin Hu, Yi Yang, Si Liu

    Abstract: Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation… ▽ More

    Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  42. arXiv:2407.07406  [pdf, other

    cs.CV cs.AI

    Weakly-supervised Medical Image Segmentation with Gaze Annotations

    Authors: Yuan Zhong, Chenhui Tang, Yumeng Yang, Ruoxi Qi, Kang Zhou, Yuqi Gong, Pheng Ann Heng, Janet H. Hsiao, Qi Dou

    Abstract: Eye gaze that reveals human observational patterns has increasingly been incorporated into solutions for vision tasks. Despite recent explorations on leveraging gaze to aid deep networks, few studies exploit gaze as an efficient annotation approach for medical image segmentation which typically entails heavy annotating costs. In this paper, we propose to collect dense weak supervision for medical… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  43. arXiv:2407.07397  [pdf, other

    cs.SD eess.AS

    SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness

    Authors: Jie Lin, Xiuping Yang, Li Xiao, Xinhong Li, Weiyan Yi, Yuhong Yang, Weiping Tu, Xiong Chen

    Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a prevalent chronic breathing disorder caused by upper airway obstruction. Previous studies advanced OSAHS evaluation through machine learning-based systems trained on sleep snoring or speech signal datasets. However, constructing datasets for training a precise and rapid OSAHS evaluation system poses a challenge, since 1) it is time-consuming t… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  44. arXiv:2407.07296  [pdf

    physics.med-ph cs.AI cs.CV

    Large Language Model-Augmented Auto-Delineation of Treatment Target Volume in Radiation Therapy

    Authors: Praveenbalaji Rajendran, Yong Yang, Thomas R. Niedermayr, Michael Gensheimer, Beth Beadle, Quynh-Thu Le, Lei Xing, Xianjin Dai

    Abstract: Radiation therapy (RT) is one of the most effective treatments for cancer, and its success relies on the accurate delineation of targets. However, target delineation is a comprehensive medical decision that currently relies purely on manual processes by human experts. Manual delineation is time-consuming, laborious, and subject to interobserver variations. Although the advancements in artificial i… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  45. arXiv:2407.06842  [pdf, other

    cs.CV

    Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

    Authors: Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

    Abstract: Recent work on image content manipulation based on vision-language pre-training models has been effectively extended to text-driven 3D scene editing. However, existing schemes for 3D scene editing still exhibit certain shortcomings, hindering their further interactive design. Such schemes typically adhere to fixed input patterns, limiting users' flexibility in text input. Moreover, their editing c… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024; Project Website: https://sk-fun.fun/CE3D

  46. arXiv:2407.06653  [pdf, other

    cs.CV

    Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography

    Authors: Pengfei Zhao, Qigong Sun, Xiaolin Tian, Yige Yang, Shuo Tao, Jie Cheng, Jiantong Chen

    Abstract: There has been growing interest in facial video-based remote photoplethysmography (rPPG) measurement recently, with a focus on assessing various vital signs such as heart rate and heart rate variability. Despite previous efforts on static datasets, their approaches have been hindered by inaccurate region of interest (ROI) localization and motion issues, and have shown limited generalization in rea… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: CVPR workshop 2024 accepted

  47. arXiv:2407.06617  [pdf, other

    cs.CV

    Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task

    Authors: Yiran Yang, Jinchao Zhang, Ying Deng, Jie Zhou

    Abstract: Inspired by the success of the text-to-image (T2I) generation task, many researchers are devoting themselves to the text-to-video (T2V) generation task. Most of the T2V frameworks usually inherit from the T2I model and add extra-temporal layers of training to generate dynamic videos, which can be viewed as a fine-tuning task. However, the traditional 3D-Unet is a serial mode and the temporal layer… ▽ More

    Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Technical report

  48. arXiv:2407.06540  [pdf, other

    cs.CV cs.AI

    General and Task-Oriented Video Segmentation

    Authors: Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang

    Abstract: We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Project page: https://github.com/kagawa588/GvSeg

  49. arXiv:2407.06524  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer

    Authors: Jizhen Li, Xinmeng Xu, Weiping Tu, Yuhong Yang, Rong Zhu

    Abstract: Recent speech enhancement methods based on convolutional neural networks (CNNs) and transformer have been demonstrated to efficaciously capture time-frequency (T-F) information on spectrogram. However, the correlation of each channels of speech features is failed to explore. Theoretically, each channel map of speech features obtained by different convolution kernels contains information with diffe… ▽ More

    Submitted 13 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  50. arXiv:2407.06191  [pdf, other

    cs.CV

    Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

    Authors: Zhangyang Qi, Yunhan Yang, Mengchen Zhang, Long Xing, Xiaoyang Wu, Tong Wu, Dahua Lin, Xihui Liu, Jiaqi Wang, Hengshuang Zhao

    Abstract: Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing challenge. Specifically, 3D Generation methods lack the ability to follow finely detailed instructions as precisely as their 2D image creation counterparts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project Page: https://tailor3d-2024.github.io/