Skip to main content

Showing 1–50 of 2,892 results for author: Zhang, W

  1. arXiv:2407.13683  [pdf

    eess.SP cs.IT

    Quasi-Fractal UCA Based N-Dimensional OAM Orthogonal Transmission

    Authors: Hongyun Jin, Wenchi Cheng, Wei Zhang

    Abstract: The vortex electromagnetic wave carried by multiple orthogonal orbital angular momentum (OAM) modes in the same frequency band can be applied to the field of wireless communications, which greatly increases the spectrum efficiency. The uniform circular array (UCA) structure is widely used to generate or receive vortex electromagnetic waves with multiple OAM-modes. However, the maximum number of or… ▽ More

    Submitted 9 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.05667

  2. arXiv:2407.13605  [pdf, other

    cs.LG

    Physics-guided Active Sample Reweighting for Urban Flow Prediction

    Authors: Wei Jiang, Tong Chen, Guanhua Ye, Wentao Zhang, Lizhen Cui, Zi Huang, Hongzhi Yin

    Abstract: Urban flow prediction is a spatio-temporal modeling task that estimates the throughput of transportation services like buses, taxis, and ride-sharing, where data-driven models have become the most popular solution in the past decade. Meanwhile, the implicitly learned mapping between historical observations to the prediction targets tend to over-simplify the dynamics of real-world urban flows, lead… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: This paper is accepted by Proceedings of the 33nd ACM International Conference on Information and Knowledge Management (CIKM '24)

  3. arXiv:2407.13596  [pdf, other

    cs.CV

    EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Xuerui Mao

    Abstract: Recent advances in visual prompting in the natural image area have allowed users to interact with artificial intelligence (AI) tools through various visual marks such as box, point, and free-form shapes. However, due to the significant difference between the natural and remote sensing (RS) images, existing visual prompting models face challenges in RS scenarios. Moreover, RS MLLMs mainly focus on… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  4. arXiv:2407.13561  [pdf, other

    cs.CL

    Research on Tibetan Tourism Viewpoints information generation system based on LLM

    Authors: Jinhu Qi, Shuai Yan, Wentao Zhang, Yibo Zhang, Zirui Liu, Ke Wang

    Abstract: Tibet, ensconced within China's territorial expanse, is distinguished by its labyrinthine and heterogeneous topography, a testament to its profound historical heritage, and the cradle of a unique religious ethos. The very essence of these attributes, however, has impeded the advancement of Tibet's tourism service infrastructure, rendering existing smart tourism services inadequate for the region's… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Journal ref: ICWOC 2024

  5. arXiv:2407.13111  [pdf, other

    cs.MM cs.CV

    PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving

    Authors: Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Shuyong Gao, Wenqiang Zhang

    Abstract: Vision foundation models are increasingly employed in autonomous driving systems due to their advanced capabilities. However, these models are susceptible to adversarial attacks, posing significant risks to the reliability and safety of autonomous vehicles. Adversaries can exploit these vulnerabilities to manipulate the vehicle's perception of its surroundings, leading to erroneous decisions and p… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: First-Place in the CVPR 2024 Workshop Challenge: Black-box Adversarial Attacks on Vision Foundation Models

  6. Multiple Access Integrated Adaptive Finite Blocklength for Ultra-Low Delay in 6G Wireless Networks

    Authors: Yixin Zhang, Wenchi Cheng, Wei Zhang

    Abstract: Facing the dramatic increase of real-time applications and time-sensitive services, large-scale ultra-low delay requirements are put forward for the sixth generation (6G) wireless networks. To support massive ultra-reliable and low-latency communications (mURLLC), in this paper we propose an adaptive finite blocklength framework to reduce the over-the-air delay for short packet transmissions with… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Journal ref: IEEE Transactions on Wireless Communications ( Volume: 23, Issue: 3, March 2024)

  7. Adaptive Finite Blocklength for Low Access Delay in 6G Wireless Networks

    Authors: Yixin Zhang, Wenchi Cheng, Wei Zhang

    Abstract: As the number of real-time applications with ultra-low delay requirements quickly grows, massive ultra-reliable and low-latency communication (mURLLC) has been proposed to provide a wide range of delay-sensitive services for the sixth generation (6G) wireless networks. However, it is difficult to meet the stringent delay demand of massive connectivity with existing grant-based (GB) random access a… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Journal ref: GLOBECOM 2022 - 2022 IEEE Global Communications Conference

  8. arXiv:2407.12565  [pdf, other

    cs.AR

    SigDLA: A Deep Learning Accelerator Extension for Signal Processing

    Authors: Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

    Abstract: Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  9. arXiv:2407.12442  [pdf, other

    cs.CV

    ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

    Authors: Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang

    Abstract: Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. code available at https://github.com/mc- lan/ClearCLIP

  10. arXiv:2407.12291  [pdf, other

    cs.CV

    JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

    Authors: Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei Zhang, Hang Xu, Dit-Yan Yeung

    Abstract: Score Distillation Sampling (SDS) by well-trained 2D diffusion models has shown great promise in text-to-3D generation. However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations. In this work, we propose \textbf{J}oint \tex… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 29 pages, ECCV2024

  11. arXiv:2407.12273  [pdf, other

    cs.CV

    GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity

    Authors: Shuo Cao, Yihao Liu, Wenlong Zhang, Yu Qiao, Chao Dong

    Abstract: Traditional single-task image restoration methods excel in handling specific degradation types but struggle with multiple degradations. To address this limitation, we propose Grouped Restoration with Image Degradation Similarity (GRIDS), a novel approach that harmonizes the competing objectives inherent in multiple-degradation restoration. We first introduce a quantitative method for assessing rel… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  12. Performance Analysis and Blocklength Minimization of Uplink RSMA for Short Packet Transmissions in URLLC

    Authors: Yixin Zhang, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: Rate splitting multiple access (RSMA) is one of the most promising techniques for ultra-reliable and low-latency communications (URLLC) with stringent requirements on delay and reliability of multiple access. To fully explore the delay performance enhancement brought by uplink RSMA to URLLC, in this paper, we evaluate the performance of two-user uplink RSMA and propose the corresponding blocklengt… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Journal ref: GLOBECOM 2023 - 2023 IEEE Global Communications Conference

  13. Dumb RIS-Assisted Random Beamforming for Energy Efficiency Enhancement of Wireless Communications

    Authors: Yixin Zhang, Wenchi Cheng, Wei Zhang

    Abstract: Energy efficiency (EE) is one of the most important metrics for the beyond fifth generation (B5G) and the future sixth generation (6G) wireless networks. Reconfigurable intelligent surface (RIS) has been widely focused on EE enhancement for wireless networks because it is power-saving, programmable, and easy to be deployed. However, RIS is generally passive and thus difficult to obtain correspondi… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 6 pages, 4 figures

    Journal ref: ICC 2022 - IEEE International Conference on Communications

  14. arXiv:2407.12237  [pdf, other

    cs.IT

    Delay Tradeoff and Adaptive Finite Blocklength Framework for URLLC

    Authors: Yixin Zhang, Wenchi Cheng, Jingqing Wang, Wei Zhang

    Abstract: With various time-sensitive tasks to be served, ultra-reliable and low-latency communications (URLLC) has become one of the most important scenarios for the fifth generation (5G) wireless communications. The end-to-end delay from the sub-millisecond-level to the second-level is first put forward for a wide range of delay-sensitive tasks in the future sixth generation (6G) communication networks, w… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  15. arXiv:2407.11948  [pdf, other

    cs.CL cs.AI

    Rethinking Transformer-based Multi-document Summarization: An Empirical Investigation

    Authors: Congbo Ma, Wei Emma Zhang, Dileepa Pitawela, Haojie Zhuang, Yanfeng Shu

    Abstract: The utilization of Transformer-based models prospers the growth of multi-document summarization (MDS). Given the huge impact and widespread adoption of Transformer-based models in various natural language processing tasks, investigating their performance and behaviors in the context of MDS becomes crucial for advancing the field and enhancing the quality of summary. To thoroughly examine the behav… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  16. arXiv:2407.11651  [pdf, other

    cs.IT eess.SP

    Fluid Antenna Grouping Index Modulation Design for MIMO Systems

    Authors: Xinghao Guo, Yin Xu, Dazhi He, Cixiao Zhang, Wenjun Zhang, Yi-yan Wu

    Abstract: Index modulation (IM) significantly enhances the spectral efficiency of fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) systems, which is named FA-IM. However, due to the dense distribution of ports on fluid antennas, the wireless channel exhibits a high spatial correlation, resulting in severe performance degradation in the existing FA-IM scheme. This paper proposes a novel flu… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: A longer and more detailed version will be submitted to an IEEE journal

  17. arXiv:2407.11644  [pdf, other

    cs.CV cs.RO

    Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

    Authors: Guoliang You, Xiaomeng Chu, Yifan Duan, Wenyu Zhang, Xingchen Li, Sha Zhang, Yao Li, Jianmin Ji, Yanyong Zhang

    Abstract: When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  18. arXiv:2407.10984  [pdf, other

    cs.NI cs.AI

    On the Combination of AI and Wireless Technologies: 3GPP Standardization Progress

    Authors: Chen Sun, Tao Cui, Wenqi Zhang, Yingshuang Bai, Shuo Wang, Haojin Li

    Abstract: Combing Artificial Intelligence (AI) and wireless communication technologies has become one of the major technologies trends towards 2030. This includes using AI to improve the efficiency of the wireless transmission and supporting AI deployment with wireless networks. In this article, the latest progress of the Third Generation Partnership Project (3GPP) standards development is introduced. Conce… ▽ More

    Submitted 16 June, 2024; originally announced July 2024.

  19. arXiv:2407.10499  [pdf, other

    cs.CL

    CIBench: Evaluating Your LLMs with a Code Interpreter Plugin

    Authors: Songyang Zhang, Chuyu Zhang, Yingfan Hu, Haowen Shen, Kuikun Liu, Zerun Ma, Fengzhe Zhou, Wenwei Zhang, Xuming He, Dahua Lin, Kai Chen

    Abstract: While LLM-Based agents, which use external tools to solve complex problems, have made significant progress, benchmarking their ability is challenging, thereby hindering a clear understanding of their limitations. In this paper, we propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. Our evaluation f… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review

  20. arXiv:2407.10486  [pdf, other

    cs.AI cs.CL

    IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

    Authors: Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

    Abstract: Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically i… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  21. arXiv:2407.09919  [pdf, other

    cs.CV

    Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors

    Authors: Wei Shang, Dongwei Ren, Wanying Zhang, Yuming Fang, Wangmeng Zuo, Kede Ma

    Abstract: Arbitrary-scale video super-resolution (AVSR) aims to enhance the resolution of video frames, potentially at various scaling factors, which presents several challenges regarding spatial detail reproduction, temporal consistency, and computational complexity. In this paper, we first describe a strong baseline for AVSR by putting together three variants of elementary building blocks: 1) a flow-guide… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024, the code is available at https://github.com/shangwei5/ST-AVSR

    ACM Class: I.4.3

  22. arXiv:2407.09895  [pdf, other

    cs.SE cs.PL

    EATXT: A textual concrete syntax for EAST-ADL

    Authors: Weixing Zhang, Jörg Holtmann, Daniel Strüber, Jan-Philipp Steghöfer

    Abstract: Blended modeling is an approach that enables users to interact with a model via multiple notations. In this context, there is a growing need for open-source industry-grade exemplars of languages with available language engineering artifacts, in particular, editors and notations for supporting the creation of models based on a single metamodel in different representations (e.g., textual, graphical,… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  23. arXiv:2407.09829  [pdf, other

    cs.RO

    VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation

    Authors: Wentao Zhao, Jiaming Chen, Ziyu Meng, Donghui Mao, Ran Song, Wei Zhang

    Abstract: Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some complex scenarios. To address this issue, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation framework which takes advantage… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by RSS2024

  24. arXiv:2407.09792  [pdf, other

    cs.RO

    Language-Augmented Symbolic Planner for Open-World Task Planning

    Authors: Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, Jia Pan

    Abstract: Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by Robotics: Science and Systems (RSS) 2024

  25. arXiv:2407.09562  [pdf, other

    cs.CV eess.IV

    Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation

    Authors: Qiang Tong, Jinrui Wang, Wenshuang Yang, Songtao Wu, Wenqi Zhang, Chen Sun, Kuanhong Xu

    Abstract: The utilization of AIoT technology has become a crucial trend in modern poultry management, offering the potential to optimize farming operations and reduce human workloads. This paper presents a real-time and compact edge-AI enabled detector designed to identify chickens and their healthy statuses using frames captured by a lightweight and intelligent camera equipped with an edge-AI enabled CMOS… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  26. arXiv:2407.08990  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

    Authors: Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo Wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: In press

  27. arXiv:2407.08706  [pdf, other

    cs.CV

    HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

    Authors: Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei Zhang, Xiaodan Liang

    Abstract: High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities. To reduce the training and computation costs caused by high-resolution input, one promising direction is to use sliding windows to slice the input into uniform patches, each matching the input size of the well-trained vision encoder. Although efficient, th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  28. arXiv:2407.08583  [pdf, other

    cs.AI cs.CV cs.LG

    The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

    Authors: Zhen Qin, Daoyuan Chen, Wenhao Zhang, Liuyi Yao, Yilun Huang, Bolin Ding, Yaliang Li, Shuiguang Deng

    Abstract: The rapid development of large language models (LLMs) has been witnessed in recent years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from text to a broader spectrum of domains, attracting widespread attention due to the broader range of application scenarios. As LLMs and MLLMs rely on vast amounts of model parameters and data to achieve emergent capabilities, the impo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Ongoing work. 31 pages. Related materials are continually maintained and available at https://github.com/modelscope/data-juicer/blob/main/docs/awesome_llm_data.md

  29. arXiv:2407.08462  [pdf, other

    cs.LG cs.NI

    Distributed Deep Reinforcement Learning Based Gradient Quantization for Federated Learning Enabled Vehicle Edge Computing

    Authors: Cui Zhang, Wenjun Zhang, Qiong Wu, Pingyi Fan, Qiang Fan, Jiangzhou Wang, Khaled B. Letaief

    Abstract: Federated Learning (FL) can protect the privacy of the vehicles in vehicle edge computing (VEC) to a certain extent through sharing the gradients of vehicles' local models instead of local data. The gradients of vehicles' local models are usually large for the vehicular artificial intelligence (AI) applications, thus transmitting such large gradients would cause large per-round latency. Gradient q… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE Journal. The source code has been released at: https://github.com/qiongwu86/Distributed-Deep-Reinforcement-Learning-Based-Gradient Quantization-for-Federated-Learning-Enabled-Vehicle-Edge-Computing

  30. arXiv:2407.08127  [pdf, other

    cs.CV

    Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

    Authors: Yufan Liu, Wanqian Zhang, Dayan Wu, Zheng Lin, Jingzi Gu, Weiping Wang

    Abstract: Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unreal… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  31. arXiv:2407.07356  [pdf, other

    cs.CV

    Video In-context Learning

    Authors: Wentao Zhang, Junliang Guo, Tianyu He, Li Zhao, Linli Xu, Jiang Bian

    Abstract: In-context learning for vision data has been underexplored compared with that in natural language. Previous works studied image in-context learning, urging models to generate a single image guided by demonstrations. In this paper, we propose and study video in-context learning, where the model starts from an existing video clip and generates diverse potential future sequences, each semantically gu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  32. arXiv:2407.07094  [pdf, other

    cs.CL cs.AI

    AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning

    Authors: Jiaxi Cui, Wentao Zhang, Jing Tang, Xudong Tong, Zhenwei Zhang, Amie, Jing Wen, Rongsheng Wang, Pengfei Wu

    Abstract: The pervasive deployment of Large Language Models-LLMs in various sectors often neglects the nuanced requirements of individuals and small organizations, who benefit more from models precisely tailored to their specific business contexts rather than those with broadly superior general capabilities. This work introduces \textbf{AnyTaskTune}, a novel fine-tuning methodology coined as \textbf{Task-Fi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  33. arXiv:2407.07053  [pdf, other

    cs.CV

    Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

    Authors: Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

    Abstract: Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In lig… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: code: https://github.com/zwq2018/Multi-modal-Self-instruct dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct Leaderboard: https://multi-modal-self-instruct.github.io/

  34. arXiv:2407.06190  [pdf, other

    cs.CV cs.LG cs.RO

    4D Contrastive Superflows are Dense 3D Representation Learners

    Authors: Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu

    Abstract: In the realm of autonomous driving, accurate 3D perception is the foundation. However, developing such models relies on extensive human annotations -- a process that is both costly and labor-intensive. To address this challenge from a data representation learning perspective, we introduce SuperFlow, a novel framework designed to harness consecutive LiDAR-camera pairs for establishing spatiotempora… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; 36 pages, 11 figures, 11 tables; Code at https://github.com/Xiangxu-0103/SuperFlow

  35. arXiv:2407.06027  [pdf, other

    cs.CL

    PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

    Authors: Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul… ▽ More

    Submitted 18 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  36. arXiv:2407.05981  [pdf, other

    cs.SE

    Towards Understanding the Bugs in Solidity Compiler

    Authors: Haoyang Ma, Wuqi Zhang, Qingchao Shen, Yongqiang Tian, Junjie Chen, Shing-Chi Cheung

    Abstract: Solidity compiler plays a key role in enabling the development of smart contract applications on Ethereum by governing the syntax of a domain-specific language called Solidity and performing compilation and optimization of Solidity code. The correctness of Solidity compiler is critical in fostering transparency, efficiency, and trust in industries reliant on smart contracts. However, like other so… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Journal ref: ISSTA 2024

  37. arXiv:2407.05621  [pdf, other

    cs.AR

    EA4RCA:Efficient AIE accelerator design framework for Regular Communication-Avoiding Algorithm

    Authors: W. B. Zhang, Y. Q. Liu, T. H. Zang, Z. S. Bao

    Abstract: With the introduction of the Adaptive Intelligence Engine (AIE), the Versal Adaptive Compute Acceleration Platform (Versal ACAP) has garnered great attention. However, the current focus of Vitis Libraries and limited research has mainly been on how to invoke AIE modules, without delving into a thorough discussion on effectively utilizing AIE in its typical use cases. As a result, the widespread ad… ▽ More

    Submitted 8 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  38. arXiv:2407.05420  [pdf, ps, other

    cs.IR

    Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation

    Authors: Xinglong Wu, Anfeng Huang, Hongwei Yang, Hui He, Yu Tai, Weizhe Zhang

    Abstract: Multi-modal recommendation greatly enhances the performance of recommender systems by modeling the auxiliary information from multi-modality contents. Most existing multi-modal recommendation models primarily exploit multimedia information propagation processes to enrich item representations and directly utilize modal-specific embedding vectors independently obtained from upstream pre-trained mode… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  39. arXiv:2407.05128  [pdf, other

    cs.CV

    SCSA: Exploring the Synergistic Effects Between Spatial and Channel Attention

    Authors: Yunzhong Si, Huiying Xu, Xinzhong Zhu, Wenhao Zhang, Yao Dong, Yuxing Chen, Hongbo Li

    Abstract: Channel and spatial attentions have respectively brought significant improvements in extracting feature dependencies and spatial structure relations for various downstream vision tasks. While their combination is more beneficial for leveraging their individual strengths, the synergy between channel and spatial attentions has not been fully explored, lacking in fully harness the synergistic potenti… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  40. arXiv:2407.04960  [pdf, other

    cs.IR

    MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models

    Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Bo Chen, Ruiming Tang, Weinan Zhang, Yong Yu

    Abstract: Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through multi-round natural language dialogues. However, most existing CRS models mainly focus on dialogue comprehension and preferences mining from the current dialogue session, overlooking user preferences in historical dialogue sessions. The preferences embedded in the user's histo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  41. arXiv:2407.04693  [pdf, other

    cs.CL cs.AI

    ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

    Authors: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

    Abstract: Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucin… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 9 pages

  42. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  43. arXiv:2407.04216  [pdf, other

    cs.RO

    Safe MPC Alignment with Human Directional Feedback

    Authors: Zhixian Xie, Wenlong Zhang, Yi Ren, Zhaoran Wang, George J. Pappas, Wanxin Jin

    Abstract: In safety-critical robot planning or control, manually specifying safety constraints or learning them from demonstrations can be challenging. In this paper, we propose a certifiable alignment method for a robot to learn a safety constraint in its model predictive control (MPC) policy with human online directional feedback. To our knowledge, it is the first method to learn safety constraints from h… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 18 pages, submission to T-RO

  44. arXiv:2407.03636  [pdf, other

    cs.CV

    Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration

    Authors: Yuhong Zhang, Hengsheng Zhang, Xinning Chai, Zhengxue Cheng, Rong Xie, Li Song, Wenjun Zhang

    Abstract: Image restoration is a classic low-level problem aimed at recovering high-quality images from low-quality images with various degradations such as blur, noise, rain, haze, etc. However, due to the inherent complexity and non-uniqueness of degradation in real-world images, it is challenging for a model trained for single tasks to handle real-world restoration problems effectively. Moreover, existin… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  45. arXiv:2407.03635  [pdf, other

    cs.CV

    MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration

    Authors: Yuhong Zhang, Hengsheng Zhang, Xinning Chai, Rong Xie, Li Song, Wenjun Zhang

    Abstract: Realistic image restoration is a crucial task in computer vision, and the use of diffusion-based models for image restoration has garnered significant attention due to their ability to produce realistic results. However, the quality of the generated images is still a significant challenge due to the severity of image degradation and the uncontrollability of the diffusion model. In this work, we de… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  46. arXiv:2407.03374  [pdf

    cs.AI cs.SE eess.SP eess.SY

    An Outline of Prognostics and Health Management Large Model: Concepts, Paradigms, and Challenges

    Authors: Laifa Tao, Shangyu Li, Haifei Liu, Qixuan Huang, Liang Ma, Guoao Ning, Yiling Chen, Yunlong Wu, Bin Li, Weiwei Zhang, Zhengduo Zhao, Wenchao Zhan, Wenyan Cao, Chao Wang, Hongmei Liu, Jian Ma, Mingliang Suo, Yujie Cheng, Yu Ding, Dengwei Song, Chen Lu

    Abstract: Prognosis and Health Management (PHM), critical for ensuring task completion by complex systems and preventing unexpected failures, is widely adopted in aerospace, manufacturing, maritime, rail, energy, etc. However, PHM's development is constrained by bottlenecks like generalization, interpretation and verification abilities. Presently, generative artificial intelligence (AI), represented by Larg… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  47. arXiv:2407.03320  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

    Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

    Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

  48. arXiv:2407.03104  [pdf, other

    cs.CV cs.CL cs.MM

    KeyVideoLLM: Towards Large-scale Video Keyframe Selection

    Authors: Hao Liang, Jiapeng Li, Tianyi Bai, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

    Abstract: Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particular… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  49. arXiv:2407.02887  [pdf, other

    cs.CV

    Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion

    Authors: Hang Xu, Chen Long, Wenxiao Zhang, Yuan Liu, Zhen Cao, Zhen Dong, Bisheng Yang

    Abstract: In this paper, we explore a novel framework, EGIInet (Explicitly Guided Information Interaction Network), a model for View-guided Point cloud Completion (ViPC) task, which aims to restore a complete point cloud from a partial one with a single view image. In comparison with previous methods that relied on the global semantics of input images, EGIInet efficiently combines the information from two m… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  50. arXiv:2407.02779  [pdf, other

    cs.AI cs.LG

    Croppable Knowledge Graph Embedding

    Authors: Yushan Zhu, Wen Zhang, Zhiqiang Liu, Mingyang Chen, Lei Liang, Huajun Chen

    Abstract: Knowledge Graph Embedding (KGE) is a common method for Knowledge Graphs (KGs) to serve various artificial intelligence tasks. The suitable dimensions of the embeddings depend on the storage and computing conditions of the specific application scenarios. Once a new dimension is required, a new KGE model needs to be trained from scratch, which greatly increases the training cost and limits the effic… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.