Skip to main content

Showing 1–50 of 480 results for author: Yu, T

  1. arXiv:2407.12883  [pdf, other

    cs.CL cs.AI cs.IR

    BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

    Authors: Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu

    Abstract: Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many complex real-world queries require in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires unde… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 50 pages

  2. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  3. arXiv:2407.08138  [pdf, other

    cs.SE

    How Do Developers Structure Unit Test Cases? An Empirical Study from the "AAA" Perspective

    Authors: Chenhao Wei, Lu Xiao, Tingting Yu, Sunny Wong, Abigail Clune

    Abstract: The AAA pattern, i.e. arrange, act, and assert, provides a unified structure for unit test cases, which benefits comprehension and maintenance. However, there is little understanding regarding whether and how common real-life developers structure unit test cases following AAA in practice. In particular, are there recurring anti-patterns that deviate from the AAA structure and merit refactoring? An… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    ACM Class: D.2.5

  4. arXiv:2407.07858  [pdf, other

    cs.LG cs.CL

    FACTS About Building Retrieval Augmented Generation-based Chatbots

    Authors: Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha, Prithvi Raj, Abhinav Balasubramanian, Murali Maram, Guru Muthusamy, Shivakesh Reddy Annepally, Sidney Knowles, Min Du, Nick Burnett, Sean Javiya, Ashok Marannan, Mamta Kumari, Surbhi Jha, Ethan Dereszenski, Anupam Chakraborty, Subhash Ranjan , et al. (13 additional authors not shown)

    Abstract: Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures, 2 tables, Preprint submission to ACM CIKM 2024

  5. arXiv:2407.07291  [pdf, other

    cs.LG cs.AI stat.ML

    Causal Discovery in Semi-Stationary Time Series

    Authors: Shanyun Gao, Raghavendra Addanki, Tong Yu, Ryan A. Rossi, Murat Kocaoglu

    Abstract: Discovering causal relations from observational time series without making the stationary assumption is a significant challenge. In practice, this challenge is common in many areas, such as retail sales, transportation systems, and medical science. Here, we consider this problem for a class of non-stationary time series. The structural causal model (SCM) of this type of time series, called the sem… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    ACM Class: I.2.6, G.3

  6. arXiv:2407.07290  [pdf, other

    cs.LG cs.AI stat.ML

    Causal Discovery-Driven Change Point Detection in Time Series

    Authors: Shanyun Gao, Raghavendra Addanki, Tong Yu, Ryan A. Rossi, Murat Kocaoglu

    Abstract: Change point detection in time series seeks to identify times when the probability distribution of time series changes. It is widely applied in many areas, such as human-activity sensing and medical science. In the context of multivariate time series, this typically involves examining the joint distribution of high-dimensional data: If any one variable changes, the whole time series is assumed to… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    ACM Class: I.2.6, G.3

  7. arXiv:2407.05165  [pdf, other

    cs.SE

    Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps

    Authors: Dingbang Wang, Yu Zhao, Sidong Feng, Zhaoxu Zhang, William G. J. Halfond, Chunyang Chen, Xiaoxia Sun, Jiangfan Shi, Tingting Yu

    Abstract: In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model, to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative promp… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted by ISSTA 2024

  8. arXiv:2407.02803  [pdf, other

    cs.DB

    KnobCF: Uncertainty-aware Knob Tuning

    Authors: Yu Yan, Junfang Huang, Hongzhi Wang, Jian Geng, Kaixin Zhang, Tao Yu

    Abstract: The knob tuning aims to optimize database performance by searching for the most effective knob configuration under a certain workload. Existing works suffer two significant problems. On the one hand, there exist multiple similar even useless evaluations of knob tuning even with the diverse searching methods because of the different sensitivities of knobs on a certain workload. On the other hand, t… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  9. arXiv:2407.02750  [pdf, other

    cs.CL

    Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data

    Authors: Younghun Lee, Sungchul Kim, Ryan A. Rossi, Tong Yu, Xiang Chen

    Abstract: Large Language Models (LLMs) have been achieving competent performance on a wide range of downstream tasks, yet existing work shows that inference on structured data is challenging for LLMs. This is because LLMs need to either understand long structured data or select the most relevant evidence before inference, and both approaches are not trivial. This paper proposes a framework, Learning to Redu… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: ICML 2024 Workshop on Long-Context Foundation Models, Vienna, Austria 2024. arXiv admin note: substantial text overlap with arXiv:2402.14195

  10. arXiv:2406.12044  [pdf, other

    cs.CV

    ARTIST: Improving the Generation of Text-rich Images by Disentanglement

    Authors: Jianyi Zhang, Yufan Zhou, Jiuxiang Gu, Curtis Wigington, Tong Yu, Yiran Chen, Tong Sun, Ruiyi Zhang

    Abstract: Diffusion models have demonstrated exceptional capabilities in generating a broad spectrum of visual content, yet their proficiency in rendering text is still limited: they often generate inaccurate characters or words that fail to blend well with the underlying image. To address these shortcomings, we introduce a new framework named ARTIST. This framework incorporates a dedicated textual diffusio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.11645  [pdf, other

    cs.HC cs.CV

    SeamPose: Repurposing Seams as Capacitive Sensors in a Shirt for Upper-Body Pose Tracking

    Authors: Tianhong Catherine Yu, Manru, Zhang, Peter He, Chi-Jung Lee, Cassidy Cheesman, Saif Mahmud, Ruidong Zhang, François Guimbretière, Cheng Zhang

    Abstract: Seams are areas of overlapping fabric formed by stitching two or more pieces of fabric together in the cut-and-sew apparel manufacturing process. In SeamPose, we repurposed seams as capacitive sensors in a shirt for continuous upper-body pose estimation. Compared to previous all-textile motion-capturing garments that place the electrodes on the surface of clothing, our solution leverages existing… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  12. arXiv:2406.02645  [pdf, ps, other

    physics.comp-ph cs.AI cs.LG math.NA

    Astral: training physics-informed neural networks with error majorants

    Authors: Vladimir Fanaskov, Tianchi Yu, Alexander Rudikov, Ivan Oseledets

    Abstract: The primal approach to physics-informed learning is a residual minimization. We argue that residual is, at best, an indirect measure of the error of approximate solution and propose to train with error majorant instead. Since error majorant provides a direct upper bound on error, one can reliably estimate how close PiNN is to the exact solution and stop the optimization process when the desired ac… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  13. arXiv:2406.01334  [pdf, other

    cs.CV

    HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models

    Authors: Mengcheng Li, Hongwen Zhang, Yuxiang Zhang, Ruizhi Shao, Tao Yu, Yebin Liu

    Abstract: Recent years have witnessed a trend of the deep integration of the generation and reconstruction paradigms. In this paper, we extend the ability of controllable generative models for a more comprehensive hand mesh recovery task: direct hand mesh generation, inpainting, reconstruction, and fitting in a single framework, which we name as Holistic Hand Mesh Recovery (HHMR). Our key observation is tha… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: accepted in CVPR2024, project page: https://dw1010.github.io/project/HHMR/HHMR.html

  14. arXiv:2405.19600  [pdf, ps, other

    cs.LG cs.AI

    Do spectral cues matter in contrast-based graph self-supervised learning?

    Authors: Xiangru Jian, Xinjian Zhao, Wei Pang, Chaolong Ying, Yimu Wang, Yaoyao Xu, Tianshu Yu

    Abstract: The recent surge in contrast-based graph self-supervised learning has prominently featured an intensified exploration of spectral cues. However, an intriguing paradox emerges, as methods grounded in seemingly conflicting assumptions or heuristic approaches regarding the spectral domain demonstrate notable enhancements in learning performance. This paradox prompts a critical inquiry into the genuin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  15. arXiv:2405.18711  [pdf, other

    cs.AI cs.CL

    Calibrating Reasoning in Language Models with Internal Consistency

    Authors: Zhihui Xie, Jizhou Guo, Tong Yu, Shuai Li

    Abstract: Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks, aided by techniques like chain-of-thought (CoT) prompting that elicits verbalized reasoning. However, LLMs often generate text with obvious mistakes and contradictions, raising doubts about their ability to robustly process and utilize generated rationales. In this work, we investigate CoT reasoning… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  16. arXiv:2405.17976  [pdf

    cs.AI cs.CL

    Yuan 2.0-M32: Mixture of Experts with Attention Router

    Authors: Shaohua Wu, Jiangang Luo, Xi Chen, Lingjun Li, Xudong Zhao, Tong Yu, Chao Wang, Yue Wang, Fei Wang, Weixu Qiao, Houbo He, Zeru Zhang, Zeyu Sun, Junxiong Mao, Chong Shen

    Abstract: Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 14 pages,3 figures, 7 tables

  17. arXiv:2405.17902  [pdf, other

    cs.AI cs.CL cs.LG

    Boosting Protein Language Models with Negative Sample Mining

    Authors: Yaoyao Xu, Xinjian Zhao, Xiaozhuang Song, Benyou Wang, Tianshu Yu

    Abstract: We introduce a pioneering methodology for boosting large language models in the domain of protein representation learning. Our primary contribution lies in the refinement process for correlating the over-reliance on co-evolution knowledge, in a way that networks are trained to distill invaluable insights from negative samples, constituted by protein pairs sourced from disparate categories. By capi… ▽ More

    Submitted 29 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 16 pages, 4 figures. Accepted by ECML-PKDD 2024

  18. arXiv:2405.17220  [pdf, other

    cs.CL

    RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

    Authors: Tianyu Yu, Haoye Zhang, Yuan Yao, Yunkai Dang, Da Chen, Xiaoman Lu, Ganqu Cui, Taiwen He, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

    Abstract: Learning from feedback reduces the hallucination of multimodal large language models (MLLMs) by aligning them with human preferences. While traditional methods rely on labor-intensive and time-consuming manual labeling, recent approaches employing models as automatic labelers have shown promising results without human intervention. However, these methods heavily rely on costly proprietary models l… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Website: https://github.com/RLHF-V/RLAIF-V

  19. arXiv:2405.12664  [pdf, other

    cs.NI

    IREE Oriented Green 6G Networks: A Radial Basis Function Based Approach

    Authors: Tao Yu, Pengbo Huang, Shunqing Zhang, Xiaojing Chen, Yanzan Sun, Xin Wang

    Abstract: In order to provide design guidelines for energy efficient 6G networks, we propose a novel radial basis function (RBF) based optimization framework to maximize the integrated relative energy efficiency (IREE) metric. Different from the conventional energy efficient optimization schemes, we maximize the transformed utility for any given IREE using spectrum efficiency oriented RBF network and gradua… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  20. arXiv:2405.08538  [pdf, other

    cs.LG

    Self-Distillation Improves DNA Sequence Inference

    Authors: Tong Yu, Lei Cheng, Ruslan Khalitov, Erland Brandser Olsson, Zhirong Yang

    Abstract: Self-supervised pretraining (SSP) has been recognized as a method to enhance prediction accuracy in various downstream tasks. However, its efficacy for DNA sequences remains somewhat constrained. This limitation stems primarily from the fact that most existing SSP approaches in genomics focus on masked language modeling of individual sequences, neglecting the crucial aspect of encoding statistics… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  21. Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

    Authors: Yitong Jin, Zhiping Qiu, Yi Shi, Shuangpeng Sun, Chongwu Wang, Donghao Pan, Jiachen Zhao, Zhenghao Liang, Yuan Wang, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

    Abstract: In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different v… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH2024

  22. arXiv:2405.04760  [pdf, other

    cs.CR cs.AI

    Large Language Models for Cyber Security: A Systematic Literature Review

    Authors: HanXiang Xu, ShenAo Wang, NingKe Li, KaiLong Wang, YanJie Zhao, Kai Chen, Ting Yu, Yang Liu, HaoYu Wang

    Abstract: The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we con… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 46 pages,6 figures

  23. arXiv:2405.03637  [pdf, other

    cs.LG

    Collage: Light-Weight Low-Precision Strategy for LLM Training

    Authors: Tao Yu, Gaurav Gupta, Karthick Gopalswamy, Amith Mamidala, Hao Zhou, Jeffrey Huynh, Youngsuk Park, Ron Diamant, Anoop Deoras, Luke Huan

    Abstract: Large models training is plagued by the intense compute cost and limited hardware memory. A practical solution is low-precision representation but is troubled by loss in numerical accuracy and unstable training rendering the model less useful. We argue that low-precision floating points can perform well provided the error is properly compensated at the critical locations in the training process. W… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  24. arXiv:2404.15676  [pdf, other

    cs.CL cs.AI

    Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs

    Authors: Yu Xia, Rui Wang, Xu Liu, Mingyan Li, Tong Yu, Xiang Chen, Julian McAuley, Shuai Li

    Abstract: Chain-of-Thought (CoT) has been a widely adopted prompting method, eliciting impressive reasoning abilities of Large Language Models (LLMs). Inspired by the sequential thought structure of CoT, a number of Chain-of-X (CoX) methods have been developed to address various challenges across diverse domains and tasks involving LLMs. In this paper, we provide a comprehensive survey of Chain-of-X methods… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  25. arXiv:2404.12980  [pdf, other

    cs.HC

    Ring-a-Pose: A Ring for Continuous Hand Pose Tracking

    Authors: Tianhong Catherine Yu, Guilin Hu, Ruidong Zhang, Hyunchul Lim, Saif Mahmud, Chi-Jung Lee, Ke Li, Devansh Agarwal, Shuyang Nie, Jinseok Oh, François Guimbretière, Cheng Zhang

    Abstract: We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  26. arXiv:2404.07972  [pdf, other

    cs.AI cs.CL

    OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

    Authors: Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu

    Abstract: Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: 51 pages, 21 figures

  27. arXiv:2404.04306  [pdf, other

    cs.CR cs.AI cs.CL cs.CY

    AuditGPT: Auditing Smart Contracts with ChatGPT

    Authors: Shihao Xia, Shuai Shao, Mengting He, Tingting Yu, Linhai Song, Yiying Zhang

    Abstract: To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each containing a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to either man… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  28. arXiv:2404.03514  [pdf, other

    cs.CL cs.AI

    Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach

    Authors: Chengkai Huang, Rui Wang, Kaige Xie, Tong Yu, Lina Yao

    Abstract: Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. Despite their great success, the knowledge provided by the retrieval process is not always useful for improving the model prediction, since in some samples LLMs may already be quite knowledgeable and thus be able to answer the question correctly without retrieval. Aiming to save the cost of retrie… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  29. arXiv:2404.01588  [pdf, other

    cs.CL cs.AI cs.LG

    Hallucination Diversity-Aware Active Learning for Text Summarization

    Authors: Yu Xia, Xu Liu, Tong Yu, Sungchul Kim, Ryan A. Rossi, Anup Rao, Tung Mai, Shuai Li

    Abstract: Large Language Models (LLMs) have shown propensity to generate hallucinated outputs, i.e., texts that are factually incorrect or unsupported. Existing methods for alleviating hallucinations typically require costly human annotations to identify and correct hallucinations in LLM outputs. Moreover, most of these methods focus on a specific type of hallucination, e.g., entity or token errors, which l… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024

  30. arXiv:2403.19930  [pdf, other

    cs.CL

    Are LLMs Effective Backbones for Fine-tuning? An Experimental Investigation of Supervised LLMs on Chinese Short Text Matching

    Authors: Shulin Liu, Chengcheng Xu, Hao Liu, Tinghao Yu, Tao Yang

    Abstract: The recent success of Large Language Models (LLMs) has garnered significant attention in both academia and industry. Prior research on LLMs has primarily focused on enhancing or leveraging their generalization capabilities in zero- and few-shot settings. However, there has been limited investigation into effectively fine-tuning LLMs for a specific natural language understanding task in supervised… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  31. arXiv:2403.18864  [pdf, other

    physics.ao-ph cs.AI cs.LG

    Interpretable Machine Learning for Weather and Climate Prediction: A Survey

    Authors: Ruyi Yang, Jingyu Hu, Zihao Li, Jianli Mu, Tingzhao Yu, Jiangjiang Xia, Xuhong Li, Aritra Dasgupta, Haoyi Xiong

    Abstract: Advanced machine learning models have recently achieved high predictive accuracy for weather and climate prediction. However, these complex models often lack inherent transparency and interpretability, acting as "black boxes" that impede user trust and hinder further model improvements. As such, interpretable machine learning techniques have become crucial in enhancing the credibility and utility… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 26 pages, 5 figures

  32. SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

    Authors: Ting-Yao Hsu, Chieh-Yang Huang, Shih-Hong Huang, Ryan Rossi, Sungchul Kim, Tong Yu, C. Lee Giles, Ting-Hao K. Huang

    Abstract: Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems

  33. arXiv:2403.17646  [pdf, other

    cs.LG

    Uncertainty-aware Distributional Offline Reinforcement Learning

    Authors: Xiaocong Chen, Siyu Wang, Tong Yu, Lina Yao

    Abstract: Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often ove… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  34. arXiv:2403.17610  [pdf, other

    cs.CV

    MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors

    Authors: He Zhang, Shenghao Ren, Haolei Yuan, Jianhui Zhao, Fan Li, Shuangpeng Sun, Zhenghao Liang, Tao Yu, Qiu Shen, Xun Cao

    Abstract: Foot contact is an important cue for human motion capture, understanding, and generation. Existing datasets tend to annotate dense foot contact using visual matching with thresholding or incorporating pressure signals. However, these approaches either suffer from low accuracy or are only designed for small-range and slow motion. There is still a lack of a vision-pressure multimodal dataset with la… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  35. arXiv:2403.11373  [pdf, other

    cs.CV

    Reconstruct before Query: Continual Missing Modality Learning with Decomposed Prompt Collaboration

    Authors: Shu Zhao, Xiaohan Zou, Tan Yu, Huijuan Xu

    Abstract: Pre-trained large multi-modal models (LMMs) exploit fine-tuning to adapt diverse user applications. Nevertheless, fine-tuning may face challenges due to deactivated sensors (e.g., cameras turned off for privacy or technical issues), yielding modality-incomplete data and leading to inconsistency in training data and the data for inference. Additionally, continuous training leads to catastrophic for… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  36. arXiv:2403.09973  [pdf, other

    cs.CV

    Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience

    Authors: Xiaohang Yu, Zhengxian Yang, Shi Pan, Yuqi Han, Haoxiang Wang, Jun Zhang, Shi Yan, Borong Lin, Lei Yang, Tao Yu, Lu Fang

    Abstract: We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3D scene reconstruction algorithms such as IBRnet, NeRF, and 3D Gaussian splitting. More importantly, the collected dataset, which is much denser than… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  37. arXiv:2403.09606  [pdf, ps, other

    cs.CL cs.AI

    Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey

    Authors: Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian McAuley, Wei Ai, Furong Huang

    Abstract: Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of Natural Language Processing (NLP) models by capturing causal relationships among variables. The emergence of generative Large Language Models (LLMs) has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on e… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  38. arXiv:2403.07556  [pdf, other

    cs.CL

    Truth-Aware Context Selection: Mitigating Hallucinations of Large Language Models Being Misled by Untruthful Contexts

    Authors: Tian Yu, Shaolei Zhang, Yang Feng

    Abstract: Although Large Language Models (LLMs) have demonstrated impressive text generation capabilities, they are easily misled by untruthful contexts provided by users or knowledge augmentation tools, leading to hallucinations. To alleviate LLMs from being misled by untruthful context and take advantage of knowledge augmentation, we propose Truth-Aware Context Selection (TACS), a lightweight method to ad… ▽ More

    Submitted 10 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL 2024 Findings. Code is available at: https://github.com/ictnlp/TACS

  39. A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes

    Authors: Ting Yu, Xiaojun Lin, Shuhui Wang, Weiguo Sheng, Qingming Huang, Jun Yu

    Abstract: Three-Dimensional (3D) dense captioning is an emerging vision-language bridging task that aims to generate multiple detailed and accurate descriptions for 3D scenes. It presents significant potential and challenges due to its closer representation of the real world compared to 2D visual captioning, as well as complexities in data collection and processing of 3D point cloud sources. Despite the pop… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  40. arXiv:2403.07350  [pdf, other

    cs.CL cs.AI cs.CV

    VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

    Authors: Han Huang, Haitian Zhong, Tao Yu, Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan

    Abstract: Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 9+11 pages (main+appendix), 7 figures, 13 tables. $\href{https://github.com/VLKEB/VLKEB}{\text{get code and data}}$

  41. arXiv:2403.07213  [pdf, other

    cs.LG stat.ML

    Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

    Authors: Yu Xia, Fang Kong, Tong Yu, Liya Guo, Ryan A. Rossi, Sungchul Kim, Shuai Li

    Abstract: Web-based applications such as chatbots, search engines and news recommendations continue to grow in scale and complexity with the recent surge in the adoption of LLMs. Online model selection has thus garnered increasing attention due to the need to choose the best model among a diverse set while balancing task reward and exploration cost. Organizations faces decisions like whether to employ a cos… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by WWW'24 (Oral)

  42. arXiv:2403.06447  [pdf, other

    cs.IR cs.AI

    CoRAL: Collaborative Retrieval-Augmented Large Language Models Improve Long-tail Recommendation

    Authors: Junda Wu, Cheng-Chun Chang, Tong Yu, Zhankui He, Jianing Wang, Yupeng Hou, Julian McAuley

    Abstract: The long-tail recommendation is a challenging task for traditional recommender systems, due to data sparsity and data imbalance issues. The recent development of large language models (LLMs) has shown their abilities in complex reasoning, which can help to deduce users' preferences based on very few previous interactions. However, since most LLM-based systems rely on items' semantic meaning as the… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 11 pages

  43. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  44. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  45. arXiv:2403.04247  [pdf, other

    cs.CL

    UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities

    Authors: Yangning Li, Qingsong Lv, Tianyu Yu, Yinghui Li, Shulin Huang, Tingwei Lu, Xuming Hu, Wenhao JIang, Hai-Tao Zheng, Hui Wang

    Abstract: Entity Set Expansion (ESE) aims to identify new entities belonging to the same semantic class as a given set of seed entities. Traditional methods primarily relied on positive seed entities to represent a target semantic class, which poses challenge for the representation of ultra-fine-grained semantic classes. Ultra-fine-grained semantic classes are defined based on fine-grained semantic classes… ▽ More

    Submitted 23 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Initial Version

  46. arXiv:2403.02709  [pdf, other

    cs.RO

    RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

    Authors: Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean Kirmani, Tianhe Yu, Michael Stark, Ajinkya Jain, Karol Hausman, Dorsa Sadigh, Jeannette Bohg, Stefan Schaal

    Abstract: Natural language and images are commonly used as goal representations in goal-conditioned imitation learning (IL). However, natural language can be ambiguous and images can be over-specified. In this work, we propose hand-drawn sketches as a modality for goal specification in visual imitation learning. Sketches are easy for users to provide on the fly like language, but similar to images they can… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  47. arXiv:2403.02661  [pdf, other

    cs.SE

    How to Save My Gas Fees: Understanding and Detecting Real-world Gas Issues in Solidity Programs

    Authors: Mengting He, Shihao Xia, Boqin Qin, Nobuko Yoshida, Tingting Yu, Linhai Song, Yiying Zhang

    Abstract: The execution of smart contracts on Ethereum, a public blockchain system, incurs a fee called gas fee for its computation and data-store consumption. When programmers develop smart contracts (e.g., in the Solidity programming language), they could unknowingly write code snippets that unnecessarily cause more gas fees. These issues, or what we call gas wastes, could lead to significant monetary was… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  48. arXiv:2403.01430  [pdf, other

    cs.LG

    On Diffusion Process in SE(3)-invariant Space

    Authors: Zihan Zhou, Ruiying Liu, Jiachen Zheng, Xiaoxue Wang, Tianshu Yu

    Abstract: Sampling viable 3D structures (e.g., molecules and point clouds) with SE(3)-invariance using diffusion-based models proved promising in a variety of real-world applications, wherein SE(3)-invariant properties can be naturally characterized by the inter-point distance manifold. However, due to the non-trivial geometry, we still lack a comprehensive understanding of the diffusion mechanism within su… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  49. arXiv:2402.17811  [pdf, other

    cs.CL cs.AI cs.LG

    TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space

    Authors: Shaolei Zhang, Tian Yu, Yang Feng

    Abstract: Large Language Models (LLMs) sometimes suffer from producing hallucinations, especially LLMs may generate untruthful responses despite knowing the correct knowledge. Activating the truthfulness within LLM is the key to fully unlocking LLM's knowledge potential. In this paper, we propose TruthX, an inference-time intervention method to activate the truthfulness of LLM by identifying and editing the… ▽ More

    Submitted 5 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL 2024 main conference, Project Page: https://ictnlp.github.io/TruthX-site/

  50. arXiv:2402.16402  [pdf, other

    cs.LG cs.AI

    Graph Learning with Distributional Edge Layouts

    Authors: Xinjian Zhao, Chaolong Ying, Tianshu Yu

    Abstract: Graph Neural Networks (GNNs) learn from graph-structured data by passing local messages between neighboring nodes along edges on certain topological layouts. Typically, these topological layouts in modern GNNs are deterministically computed (e.g., attention-based GNNs) or locally sampled (e.g., GraphSage) under heuristic assumptions. In this paper, we for the first time pose that these layouts can… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 20 pages, 10 figures