Skip to main content

Showing 1–50 of 695 results for author: Song, J

  1. arXiv:2407.12538  [pdf, other

    eess.IV cs.CV

    High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

    Authors: Juan Song, Jiaxiang He, Mingtao Feng, Keyan Wang, Yunsong Li, Ajmal Mian

    Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compressio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  2. arXiv:2407.12292  [pdf, other

    cs.CV cs.AI

    Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

    Authors: Youheng Sun, Shengming Yuan, Xuanhan Wang, Lianli Gao, Jingkuan Song

    Abstract: Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose $\bf{G}$eneralized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  3. arXiv:2407.07342  [pdf, other

    cs.CL

    Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture

    Authors: Jiayang Song, Yuheng Huang, Zhehua Zhou, Lei Ma

    Abstract: As safety remains a crucial concern throughout the development lifecycle of Large Language Models (LLMs), researchers and industrial practitioners have increasingly focused on safeguarding and aligning LLM behaviors with human preferences and ethical standards. LLMs, trained on extensive multilingual corpora, exhibit powerful generalization abilities across diverse languages and domains. However,… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  4. arXiv:2407.07110  [pdf, other

    cs.LG cs.AI eess.SP

    Foundation Models for Electrocardiograms

    Authors: Junho Song, Jong-Hwan Jang, Byeong Tak Lee, DongGyun Hong, Joon-myoung Kwon, Yong-Yeon Jo

    Abstract: Foundation models, enhanced by self-supervised learning (SSL) techniques, represent a cutting-edge frontier in biomedical signal analysis, particularly for electrocardiograms (ECGs), crucial for cardiac health monitoring and diagnosis. This study conducts a comprehensive analysis of foundation models for ECGs by employing and refining innovative SSL methodologies - namely, generative and contrasti… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

    Comments: 27 pages

  5. arXiv:2407.06348  [pdf, other

    cs.CR cs.PL

    FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols

    Authors: Hongbo Wen, Hanzhi Liu, Jiaxin Song, Yanju Chen, Wenbo Guo, Yu Feng

    Abstract: Blockchain adoption has surged with the rise of Decentralized Finance (DeFi) applications. However, the significant value of digital assets managed by DeFi protocols makes them prime targets for attacks. Current smart contract vulnerability detection tools struggle with DeFi protocols due to deep logical bugs arising from complex financial interactions between multiple smart contracts. These tools… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  6. arXiv:2407.05125  [pdf, other

    cs.DC cs.LG

    A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning

    Authors: Jiajun Song, Jiajun Luo, Rongwei Lu, Shuzhao Xie, Bin Chen, Zhi Wang

    Abstract: Asynchronous Federated Learning (AFL) confronts inherent challenges arising from the heterogeneity of devices (e.g., their computation capacities) and low-bandwidth environments, both potentially causing stale model updates (e.g., local gradients) for global aggregation. Traditional approaches mitigating the staleness of updates typically focus on either adjusting the local updating or gradient co… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  7. arXiv:2407.04561  [pdf, other

    cs.NI eess.SP

    Wireless Spectrum in Rural Farmlands: Status, Challenges and Opportunities

    Authors: Mukaram Shahid, Kunal Das, Taimoor Ul Islam, Christ Somiah, Daji Qiao, Arsalan Ahmad, Jimming Song, Zhengyuan Zhu, Sarath Babu, Yong Guan, Tusher Chakraborty, Suraj Jog, Ranveer Chandra, Hongwei Zhang

    Abstract: Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to hi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  8. arXiv:2407.04295  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Jailbreak Attacks and Defenses Against Large Language Models: A Survey

    Authors: Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Jiaxing Song, Ke Xu, Qi Li

    Abstract: Large Language Models (LLMs) have performed exceptionally in various text-generative tasks, including question answering, translation, code completion, etc. However, the over-assistance of LLMs has raised the challenge of "jailbreaking", which induces the model to generate malicious responses against the usage policy and society by designing adversarial prompts. With the emergence of jailbreak att… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  9. arXiv:2407.01598  [pdf

    cs.LG cs.AI

    Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast

    Authors: Yifan Hu, Fukang Yin, Weimin Zhang, Kaijun Ren, Junqiang Song, Kefeng Deng, Di Zhang

    Abstract: Long-term stability stands as a crucial requirement in data-driven medium-range global weather forecasting. Spectral bias is recognized as the primary contributor to instabilities, as data-driven methods difficult to learn small-scale dynamics. In this paper, we reveal that the universal mechanism for these instabilities is not only related to spectral bias but also to distortions brought by proce… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  10. arXiv:2407.00081  [pdf, other

    cs.DC cs.AI cs.ET cs.LG cs.NI

    Semantic Revolution from Communications to Orchestration for 6G: Challenges, Enablers, and Research Directions

    Authors: Masoud Shokrnezhad, Hamidreza Mazandarani, Tarik Taleb, Jaeseung Song, Richard Li

    Abstract: In the context of emerging 6G services, the realization of everything-to-everything interactions involving a myriad of physical and digital entities presents a crucial challenge. This challenge is exacerbated by resource scarcity in communication infrastructures, necessitating innovative solutions for effective service implementation. Exploring the potential of Semantic Communications (SemCom) to… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: Accepted at IEEE Network magazine special issue: Goal-oriented Semantic Communication and Networking

  11. arXiv:2406.18151  [pdf, other

    cs.CV

    SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery

    Authors: Jian Song, Hongruixuan Chen, Weihao Xuan, Junshi Xia, Naoto Yokoya

    Abstract: Global semantic 3D understanding from single-view high-resolution remote sensing (RS) imagery is crucial for Earth Observation (EO). However, this task faces significant challenges due to the high costs of annotations and data collection, as well as geographically restricted data availability. To address these challenges, synthetic data offer a promising solution by being easily accessible and thu… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  12. arXiv:2406.14984  [pdf, ps, other

    cs.DS

    Colorful Priority $k$-Supplier

    Authors: Chandra Chekuri, Junkai Song

    Abstract: In the Priority $k$-Supplier problem the input consists of a metric space $(F \cup C, d)$ over set of facilities $F$ and a set of clients $C$, an integer $k > 0$, and a non-negative radius $r_v$ for each client $v \in C$. The goal is to select $k$ facilities $S \subseteq F$ to minimize $\max_{v \in C} \frac{d(v,S)}{r_v}$ where $d(v,S)$ is the distance of $v$ to the closes facility in $S$. This pro… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  13. arXiv:2406.13929  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination

    Authors: Jongyoon Song, Sangwon Yu, Sungroh Yoon

    Abstract: In this paper, we identify a new category of bias that induces input-conflicting hallucinations, where large language models (LLMs) generate responses inconsistent with the content of the input context. This issue we have termed the false negative problem refers to the phenomenon where LLMs are predisposed to return negative judgments when assessing the correctness of a statement given the context… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures

  14. arXiv:2406.12907  [pdf, other

    cs.LG cs.CL

    Reconciling Kaplan and Chinchilla Scaling Laws

    Authors: Tim Pearce, Jinyeop Song

    Abstract: Kaplan et al. [2020] (`Kaplan') and Hoffmann et al. [2022] (`Chinchilla') studied the scaling behavior of transformers trained on next-token language prediction. These studies produced different estimates for how the number of parameters ($N$) and training tokens ($D$) should be set to achieve the lowest possible loss for a given compute budget ($C$). Kaplan: $N_\text{optimal} \propto C^{0.73}$, C… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  15. arXiv:2406.12315  [pdf, other

    cs.AI

    PruningBench: A Comprehensive Benchmark of Structural Pruning

    Authors: Haoling Li, Changhao Li, Mengqi Xue, Gongfan Fang, Sheng Zhou, Zunlei Feng, Huiqiong Wang, Yong Wang, Lechao Cheng, Mingli Song, Jie Song

    Abstract: Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three c… ▽ More

    Submitted 28 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  16. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  17. arXiv:2406.11503  [pdf, other

    cs.CV cs.CL

    GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

    Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng

    Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  18. arXiv:2406.11128  [pdf, other

    cs.LG cs.RO

    Model Adaptation for Time Constrained Embodied Control

    Authors: Jaehyun Song, Minjong Yoo, Honguk Woo

    Abstract: When adopting a deep learning model for embodied agents, it is required that the model structure be optimized for specific tasks and operational conditions. Such optimization can be static such as model compression or dynamic such as adaptive inference. Yet, these techniques have not been fully investigated for embodied control systems subject to time constraints, which necessitate sequential deci… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures, Accepted in The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024)

  19. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  20. arXiv:2406.10289  [pdf, other

    cs.CL cs.AI cs.IR

    VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning

    Authors: Cheng Niu, Yang Guan, Yuanhao Wu, Juno Zhu, Juntong Song, Randy Zhong, Kaihua Zhu, Siliang Xu, Shizhe Diao, Tong Zhang

    Abstract: The proliferation of fake news poses a significant threat not only by disseminating misleading information but also by undermining the very foundations of democracy. The recent advance of generative artificial intelligence has further exacerbated the challenge of distinguishing genuine news from fabricated stories. In response to this challenge, we introduce VeraCT Scan, a novel retrieval-augmente… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  21. CircuitVAE: Efficient and Scalable Latent Circuit Optimization

    Authors: Jialin Song, Aidan Swope, Robert Kirby, Rajarshi Roy, Saad Godil, Jonathan Raiman, Bryan Catanzaro

    Abstract: Automatically designing fast and space-efficient digital circuits is challenging because circuits are discrete, must exactly implement the desired logic, and are costly to simulate. We address these challenges with CircuitVAE, a search algorithm that embeds computation graphs in a continuous space and optimizes a learned surrogate of physical simulation by gradient descent. By carefully controllin… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference (DAC) 2024; the first two authors contributed equally

  22. arXiv:2406.09181  [pdf, other

    cs.CV cs.AI

    A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

    Authors: Yijun Bei, Hengrui Lou, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie Song, Mingli Song, Zunlei Feng

    Abstract: With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a si… ▽ More

    Submitted 13 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: This is a paper about constructing a large-scale universal evaluation benchmark for face forgery detection.The full text is 30 pages

  23. arXiv:2406.06580  [pdf, other

    cs.CL cs.AI

    Break the Chain: Large Language Models Can be Shortcut Reasoners

    Authors: Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integratio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  24. arXiv:2406.06562  [pdf, other

    cs.CL cs.AI

    Achieving Sparse Activation in Small Language Models

    Authors: Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao

    Abstract: Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 15 pages

  25. arXiv:2406.06558  [pdf, other

    cs.CL cs.AI

    Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

    Authors: Ye Zhang, Qian Leng, Mengran Zhu, Rui Ding, Yue Wu, Jintong Song, Yulu Gong

    Abstract: The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF tech… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  26. arXiv:2406.06340  [pdf, other

    cs.LG cs.AI

    Optimisation of federated learning settings under statistical heterogeneity variations

    Authors: Basem Suleiman, Muhammad Johan Alibasa, Rizka Widyarini Purwanto, Lewis Jeffries, Ali Anaissi, Jacky Song

    Abstract: Federated Learning (FL) enables local devices to collaboratively learn a shared predictive model by only periodically sharing model parameters with a central aggregator. However, FL can be disadvantaged by statistical heterogeneity produced by the diversity in each local devices data distribution, which creates different levels of Independent and Identically Distributed (IID) data. Furthermore, th… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 27 pages, 17 figures

  27. arXiv:2406.05135  [pdf

    cs.RO math.OC

    Smart Navigation System for Parking Assignment at Large Events: Incorporating Heterogeneous Driver Characteristics

    Authors: Xi Cheng, Gaofeng Su, Siyuan Feng, Ke Liu, Chen Zhu, Hui Lin, Jilin Song, Jianan Chen

    Abstract: Parking challenges escalate significantly during large events such as concerts or sports games, yet few studies address dynamic parking lot assignments for such occasions. This paper introduces a smart navigation system designed to optimize parking assignments swiftly during large events, utilizing a mixed search algorithm that accounts for the heterogeneous characteristics of drivers. We conducte… ▽ More

    Submitted 14 May, 2024; originally announced June 2024.

  28. arXiv:2406.03912  [pdf, other

    cs.AI cs.LG cs.RO eess.SY

    GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

    Authors: Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma

    Abstract: Although deep reinforcement learning has demonstrated impressive achievements in controlling various autonomous systems, e.g., autonomous vehicles or humanoid robots, its inherent reliance on random exploration raises safety concerns in their real-world applications. To improve system safety during the learning process, a variety of Safe Reinforcement Learning (SRL) algorithms have been proposed,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  29. arXiv:2406.03097  [pdf, other

    cs.LG cs.AI

    Enhancing the Resilience of Graph Neural Networks to Topological Perturbations in Sparse Graphs

    Authors: Shuqi He, Jun Zhuang, Ding Wang, Luyao Peng, Jun Song

    Abstract: Graph neural networks (GNNs) have been extensively employed in node classification. Nevertheless, recent studies indicate that GNNs are vulnerable to topological perturbations, such as adversarial attacks and edge disruptions. Considerable efforts have been devoted to mitigating these challenges. For example, pioneering Bayesian methodologies, including GraphSS and LlnDT, incorporate Bayesian labe… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  30. arXiv:2406.01595  [pdf, other

    cs.CV

    MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

    Authors: Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song

    Abstract: We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects. Moreover, it requires recovering i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://eth-ait.github.io/MultiPly/

  31. arXiv:2406.00505  [pdf, other

    cs.CV

    Improving Text Generation on Images with Synthetic Captions

    Authors: Jun Young Koh, Sang Hyun Park, Joy Song

    Abstract: The recent emergence of latent diffusion models such as SDXL and SD 1.5 has shown significant capability in generating highly detailed and realistic images. Despite their remarkable ability to produce images, generating accurate text within images still remains a challenging task. In this paper, we examine the validity of fine-tuning approaches in generating legible text within the image. We propo… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 9 pages, 12 figures

  32. arXiv:2405.20701  [pdf, other

    cs.CL cs.AI

    Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

    Authors: Pengwei Zhan, Zhen Xu, Qian Tan, Jie Song, Ru Xie

    Abstract: Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  33. arXiv:2405.19811  [pdf, ps, other

    cs.LG cs.MA

    Approximate Global Convergence of Independent Learning in Multi-Agent Systems

    Authors: Ruiyang Jin, Zaiwei Chen, Yiheng Lin, Jie Song, Adam Wierman

    Abstract: Independent learning (IL), despite being a popular approach in practice to achieve scalability in large-scale multi-agent systems, usually lacks global convergence guarantees. In this paper, we study two representative algorithms, independent $Q$-learning and independent natural actor-critic, within value-based and policy-based frameworks, and provide the first finite-sample analysis for approxima… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  34. arXiv:2405.19704  [pdf, other

    stat.ML cs.LG stat.ME

    Enhancing Sufficient Dimension Reduction via Hellinger Correlation

    Authors: Seungbeom Hong, Ilmun Kim, Jun Song

    Abstract: In this work, we develop a new theory and method for sufficient dimension reduction (SDR) in single-index models, where SDR is a sub-field of supervised dimension reduction based on conditional independence. Our work is primarily motivated by the recent introduction of the Hellinger correlation as a dependency measure. Utilizing this measure, we develop a method capable of effectively detecting th… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  35. arXiv:2405.19665  [pdf

    eess.SY cs.AI cs.LG

    A novel fault localization with data refinement for hydroelectric units

    Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

    Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

  36. arXiv:2405.18844  [pdf, other

    cs.IT eess.SP

    Optical IRS for Visible Light Communication: Modeling, Design, and Open Issues

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) offers a new and effective approach to resolving the line-of-sight blockage issue in visible light communication (VLC) by enabling redirection of light to bypass obstacles, thereby dramatically enhancing indoor VLC coverage and reliability. This article provides a comprehensive overview of OIRS for VLC, including channel modeling, design techniques, an… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  37. arXiv:2405.18093  [pdf, other

    cs.DC cs.LG

    Pipette: Automatic Fine-grained Large Language Model Training Configurator for Real-World Clusters

    Authors: Jinkyu Yim, Jaeyong Song, Yerim Choi, Jaebeen Lee, Jaewon Jung, Hongsun Jang, Jinho Lee

    Abstract: Training large language models (LLMs) is known to be challenging because of the huge computational and memory capacity requirements. To address these issues, it is common to use a cluster of GPUs with 3D parallelism, which splits a model along the data batch, pipeline stage, and intra-layer tensor dimensions. However, the use of 3D parallelism produces the additional challenge of finding the optim… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: published at DATE 2024

  38. arXiv:2405.18004  [pdf, other

    cs.CV

    SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

    Authors: Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao

    Abstract: With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency imp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  39. arXiv:2405.16605  [pdf, other

    cs.CV

    Demystify Mamba in Vision: A Linear Attention Perspective

    Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

    Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  40. arXiv:2405.16248  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

    Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

    Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  41. arXiv:2405.15738  [pdf, other

    cs.CV

    ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

    Authors: Chunjiang Ge, Sijie Cheng, Ziming Wang, Jiale Yuan, Yuan Gao, Jun Song, Shiji Song, Gao Huang, Bo Zheng

    Abstract: High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity. Current high-resolution LMMs address the quadratic complexity while still generating excessive visual tokens. However, the redundancy in visual tokens is the key problem as it leads to more substantial compute. To mitigate this issue, we propose ConvLLaVA, which emplo… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 17 pages

  42. arXiv:2405.15356  [pdf, other

    cs.CV

    Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

    Authors: Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen

    Abstract: Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropri… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages. arXiv admin note: text overlap with arXiv:2311.16922 by other authors

  43. arXiv:2405.14303  [pdf, other

    cs.LG

    Similarity-Navigated Conformal Prediction for Graph Neural Networks

    Authors: Jianqing Song, Jianguo Huang, Wenyu Jiang, Baoming Zhang, Shuangjie Li, Chongjun Wang

    Abstract: Graph Neural Networks have achieved remarkable accuracy in semi-supervised node classification tasks. However, these results lack reliable uncertainty estimates. Conformal prediction methods provide a theoretical guarantee for node classification tasks, ensuring that the conformal prediction set contains the ground-truth label with a desired probability (e.g., 95%). In this paper, we empirically s… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  44. arXiv:2405.14055  [pdf, other

    cs.CL cs.AI cs.ET

    How Many Bytes Can You Take Out Of Brain-To-Text Decoding?

    Authors: Richard Antonello, Nihita Sarma, Jerry Tang, Jiaru Song, Alexander Huth

    Abstract: Brain-computer interfaces have promising medical and scientific applications for aiding speech and studying the brain. In this work, we propose an information-based evaluation metric for brain-to-text decoders. Using this metric, we examine two methods to augment existing state-of-the-art continuous text decoders. We show that these methods, in concert, can improve brain decoding performance by up… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  45. arXiv:2405.13037  [pdf, other

    cs.CL cs.AI

    Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation

    Authors: Cheng Niu, Xingguang Wang, Xuxin Cheng, Juntong Song, Tong Zhang

    Abstract: Dialogue State Tracking (DST) is designed to monitor the evolving dialogue state in the conversations and plays a pivotal role in developing task-oriented dialogue systems. However, obtaining the annotated data for the DST task is usually a costly endeavor. In this paper, we focus on employing LLMs to generate dialogue data to reduce dialogue collection and annotation costs. Specifically, GPT-4 is… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  46. arXiv:2405.12801  [pdf, other

    cs.CL cs.IR cs.LG

    Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

    Authors: Jonghyun Song, Cheyon Jin, Wenlong Zhao, Jay-Yoon Lee

    Abstract: A common retrieve-and-rerank paradigm involves retrieving a broad set of relevant candidates using a scalable bi-encoder, followed by expensive but more accurate cross-encoders to a limited candidate set. However, this small subset often leads to error propagation from the bi-encoders, thereby restricting the performance of the overall pipeline. To address these issues, we propose the Comparing Mu… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  47. arXiv:2405.12710   

    cs.CV

    Text-Video Retrieval with Global-Local Semantic Consistent Learning

    Authors: Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Hengtao Shen

    Abstract: Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, l… ▽ More

    Submitted 15 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: The author has withdrawn this paper due to a critical definitional error in concept learning for global/local-interaction learning during training. This error led to an alignment issue with the definition of the text-video retrieval task, causing an unfair comparison with state-of-the-art (SOTA) methods. Consequently, this hindered the accurate evaluation of the paper's contributions

  48. arXiv:2405.12533  [pdf

    cs.CV

    Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering

    Authors: Hiba Maryam, Ling Fu, Jiajun Song, Tajrian ABM Shafayet, Qidi Luo, Xiang Bai, Yuliang Liu

    Abstract: The development of Urdu scene text detection, recognition, and Visual Question Answering (VQA) technologies is crucial for advancing accessibility, information retrieval, and linguistic diversity in digital content, facilitating better understanding and interaction with Urdu-language visual data. This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new mul… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by the International Conference on Document Analysis and Recognition (ICDAR) 2024

  49. arXiv:2405.11437  [pdf, other

    cs.CV

    The First Swahili Language Scene Text Detection and Recognition Dataset

    Authors: Fadila Wendigoundi Douamba, Jianjun Song, Ling Fu, Yuliang Liu, Xiang Bai

    Abstract: Scene text recognition is essential in many applications, including automated translation, information retrieval, driving assistance, and enhancing accessibility for individuals with visual impairments. Much research has been done to improve the accuracy and performance of scene text detection and recognition models. However, most of this research has been conducted in the most common languages, E… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted to ICDAR 2024

  50. arXiv:2405.11252  [pdf, other

    cs.CV

    Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

    Authors: Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversi… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.