Skip to main content

Showing 1–50 of 1,955 results for author: Lee, S

  1. arXiv:2407.13437  [pdf, other

    cs.CV

    FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions

    Authors: Sohyun Lee, Namyup Kim, Sungyeon Kim, Suha Kwak

    Abstract: Robust semantic segmentation under adverse conditions is crucial in real-world applications. To address this challenging task in practical scenarios where labeled normal condition images are not accessible in training, we propose FREST, a novel feature restoration framework for source-free domain adaptation (SFDA) of semantic segmentation to adverse conditions. FREST alternates two steps: (1) lear… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.13078  [pdf, other

    cs.CV cs.AI

    Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism

    Authors: Sangyoun Lee, Juho Jung, Changdae Oh, Sunghee Yun

    Abstract: Temporal Action Localization (TAL) is a critical task in video analysis, identifying precise start and end times of actions. Existing methods like CNNs, RNNs, GCNs, and Transformers have limitations in capturing long-range dependencies and temporal causality. To address these challenges, we propose a novel TAL architecture leveraging the Selective State Space Model (S6). Our approach integrates th… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures, Preprint

  3. arXiv:2407.13052  [pdf, other

    cs.CY cs.DS cs.LG

    Matchings, Predictions and Counterfactual Harm in Refugee Resettlement Processes

    Authors: Seungeon Lee, Nina Corvelo Benz, Suhas Thejaswi, Manuel Gomez-Rodriguez

    Abstract: Resettlement agencies have started to adopt data-driven algorithmic matching to match refugees to locations using employment rate as a measure of utility. Given a pool of refugees, data-driven algorithmic matching utilizes a classifier to predict the probability that each refugee would find employment at any given location. Then, it uses the predicted probabilities to estimate the expected utility… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

    Comments: 24 pages including reference and appendix

  4. arXiv:2407.12614  [pdf

    cs.CV

    Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm

    Authors: Shiyu Liu, Congliang Zhou, Won Suk Lee

    Abstract: The strawberry industry yields significant economic benefits for Florida, yet the process of monitoring strawberry growth and yield is labor-intensive and costly. The development of machine learning-based detection and tracking methodologies has been used for helping automated monitoring and prediction of strawberry yield, still, enhancement has been limited as previous studies only applied the de… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  5. arXiv:2407.12463  [pdf, other

    cs.CV

    Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

    Authors: Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo

    Abstract: The labor-intensive labeling for semantic segmentation has spurred the emergence of Unsupervised Semantic Segmentation. Recent studies utilize patch-wise contrastive learning based on features from image-level self-supervised pretrained models. However, relying solely on similarity-based supervision from image-level pretrained models often leads to unreliable guidance due to insufficient patch-lev… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  6. arXiv:2407.12405  [pdf, other

    eess.IV cs.CV cs.RO

    Fisheye-Calib-Adapter: An Easy Tool for Fisheye Camera Model Conversion

    Authors: Sangjun Lee

    Abstract: The increasing necessity for fisheye cameras in fields such as robotics and autonomous driving has led to the proposal of various fisheye camera models. While the evolution of camera models has facilitated the development of diverse systems in the field, the lack of adaptation between different fisheye camera models means that recalibration is always necessary, which is cumbersome. This paper intr… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  7. arXiv:2407.12401  [pdf, other

    cs.LG cs.CV

    Geometric Remove-and-Retrain (GOAR): Coordinate-Invariant eXplainable AI Assessment

    Authors: Yong-Hyun Park, Junghoon Seo, Bomseok Park, Seongsu Lee, Junghyo Jo

    Abstract: Identifying the relevant input features that have a critical influence on the output results is indispensable for the development of explainable artificial intelligence (XAI). Remove-and-Retrain (ROAR) is a widely accepted approach for assessing the importance of individual pixels by measuring changes in accuracy following their removal and subsequent retraining of the modified dataset. However, w… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted in XAI in Action Workshop @ NeurIPS2023

  8. arXiv:2407.12192  [pdf, other

    cs.HC

    Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts

    Authors: Sam Yu-Te Lee, Aryaman Bahukhandi, Dongyu Liu, Kwan-Liu Ma

    Abstract: Recent advancements in Large Language Models (LLMs) and Prompt Engineering have made chatbot customization more accessible, significantly reducing barriers to tasks that previously required programming skills. However, prompt evaluation, especially at the dataset scale, remains complex due to the need to assess prompts across thousands of test instances within a dataset. Our study, based on a comp… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  9. arXiv:2407.12055  [pdf, other

    cs.CV

    Integrating Query-aware Segmentation and Cross-Attention for Robust VQA

    Authors: Wonjun Choi, Sangbeom Lee, Seungyeon Lee, Heechul Jung, Dong-Gyu Lee

    Abstract: This paper introduces a method for VizWiz-VQA using LVLM with trainable cross-attention and LoRA finetuning. We train the model with the following conditions: 1) Training with original images. 2) Training with enhanced images using CLIPSeg to highlight or contrast the original image. 3) Training with integrating the output features of Vision Transformer (ViT) and CLIPSeg features of the original i… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: CVPR Workshop accepted, Vizwiz Grand Challenge(VQA) 3rd Prize, https://vizwiz.cs.colorado.edu/VizWiz_workshop/abstracts/choi_2024_vizwiz_CVPR.pdf

  10. arXiv:2407.11859  [pdf, other

    cs.CV

    Mitigating Background Shift in Class-Incremental Semantic Segmentation

    Authors: Gilhan Park, WonJun Moon, SuBeen Lee, Tae-Young Kim, Jae-Pil Heo

    Abstract: Class-Incremental Semantic Segmentation(CISS) aims to learn new classes without forgetting the old ones, using only the labels of the new classes. To achieve this, two popular strategies are employed: 1) pseudo-labeling and knowledge distillation to preserve prior knowledge; and 2) background weight transfer, which leverages the broad coverage of background in learning new classes by transferring… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. Code is available at http://github.com/RoadoneP/ECCV2024_MBS

  11. arXiv:2407.11714  [pdf, other

    cs.CV

    Improving Unsupervised Video Object Segmentation via Fake Flow Generation

    Authors: Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Seunghoon Lee, Sungmin Woo, Sangyoun Lee

    Abstract: Unsupervised video object segmentation (VOS), also known as video salient object detection, aims to detect the most prominent object in a video at the pixel level. Recently, two-stream approaches that leverage both RGB images and optical flow maps have gained significant attention. However, the limited amount of training data remains a substantial challenge. In this study, we propose a novel data… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.11394  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

    Authors: Jiwook Kim, Seonho Lee, Jaeyo Shin, Jiho Choi, Hyunjung Shim

    Abstract: Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks due to its inherent 3D consistency. However, existing SDS-based 3D editing methods suffer from extensive training time and lead to low-quality results, primarily because these methods deviate from the sampling dynamics of diffusion models. In this paper, we propose DreamCatalyst, a novel framewo… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  13. arXiv:2407.10385  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting

    Authors: Hyungjun Yoon, Biniyam Aschalew Tolera, Taesik Gong, Kimin Lee, Sung-Ju Lee

    Abstract: Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 21 pages, 16 figures

  14. arXiv:2407.09962  [pdf, other

    cs.IR

    Correlating Power Outage Spread with Infrastructure Interdependencies During Hurricanes

    Authors: Avishek Bose, Sangkeun Lee, Narayan Bhusal, Supriya Chinthavali

    Abstract: Power outages caused by extreme weather events, such as hurricanes, can significantly disrupt essential services and delay recovery efforts, underscoring the importance of enhancing our infrastructure's resilience. This study investigates the spread of power outages during hurricanes by analyzing the correlation between the network of critical infrastructure and outage propagation. We leveraged da… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: IEEE 25th International Conference on Information Reuse and Integration for Data Science (IEEE IRI-2024)

  15. arXiv:2407.09303  [pdf, other

    cs.CV

    ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

    Authors: Sungmin Woo, Wonjoon Lee, Woo Jin Kim, Dogyoon Lee, Sangyoun Lee

    Abstract: Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. In this paper, we propose a novel framework calle… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project Page: https://sungmin-woo.github.io/prodepth/

  16. arXiv:2407.08882  [pdf, ps, other

    cs.HC

    Emerging Practices for Large Multimodal Model (LMM) Assistance for People with Visual Impairments: Implications for Design

    Authors: Jingyi Xie, Rui Yu, He Zhang, Sooyeon Lee, Syed Masum Billah, John M. Carroll

    Abstract: People with visual impairments perceive their environment non-visually and often use AI-powered assistive tools to obtain textual descriptions of visual information. Recent large vision-language model-based AI-powered tools like Be My AI are more capable of understanding users' inquiries in natural language and describing the scene in audible text; however, the extent to which these tools are usef… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  17. arXiv:2407.07302  [pdf, other

    eess.IV cs.CV

    Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution

    Authors: Yuehan Zhang, Seungjun Lee, Angela Yao

    Abstract: Standard single-image super-resolution creates paired training data from high-resolution images through fixed downsampling kernels. However, real-world super-resolution (RWSR) faces unknown degradations in the low-resolution inputs, all the while lacking paired training data. Existing methods approach this problem by learning blind general models through complex synthetic augmentations on training… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  18. arXiv:2407.06613  [pdf, other

    cs.CV

    Sparse-DeRF: Deblurred Neural Radiance Fields from Sparse View

    Authors: Dogyoon Lee, Donghyeong Kim, Jungho Lee, Minhyeok Lee, Seunghoon Lee, Sangyoun Lee

    Abstract: Recent studies construct deblurred neural radiance fields (DeRF) using dozens of blurry images, which are not practical scenarios if only a limited number of blurry images are available. This paper focuses on constructing DeRF from sparse-view for more pragmatic real-world scenarios. As observed in our experiments, establishing DeRF from sparse views proves to be a more challenging problem due to… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Project page: https://dogyoonlee.github.io/sparsederf/

  19. arXiv:2407.04879  [pdf, other

    cs.SD eess.AS

    All Neural Low-latency Directional Speech Extraction

    Authors: Ashutosh Pandey, Sanha Lee, Juan Azcarreta, Daniel Wong, Buye Xu

    Abstract: We introduce a novel all neural model for low-latency directional speech extraction. The model uses direction of arrival (DOA) embeddings from a predefined spatial grid, which are transformed and fused into a recurrent neural network based speech extraction model. This process enables the model to effectively extract speech from a specified DOA. Unlike previous methods that relied on hand-crafted… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted for publication at INTERSPEECH 2024

  20. arXiv:2407.04833  [pdf, other

    cs.CV cs.AI

    3D Adaptive Structural Convolution Network for Domain-Invariant Point Cloud Recognition

    Authors: Younggun Kim, Beomsik Cho, Seonghoon Ryoo, Soomok Lee

    Abstract: Adapting deep learning networks for point cloud data recognition in self-driving vehicles faces challenges due to the variability in datasets and sensor technologies, emphasizing the need for adaptive techniques to maintain accuracy across different conditions. In this paper, we introduce the 3D Adaptive Structural Convolution Network (3D-ASCN), a cutting-edge framework for 3D point cloud recognit… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

    ACM Class: I.2.10; I.5.1

  21. arXiv:2407.04345  [pdf, other

    cs.CV

    CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images

    Authors: Jisu Shin, Junmyeong Lee, Seongmin Lee, Min-Gyu Park, Ju-Mi Kang, Ju Hong Yoon, Hae-Gon Jeon

    Abstract: We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. Our central concept involves integrating individual reconstruction results into the canonical space. To be specific, we first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh… ▽ More

    Submitted 15 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 Accepted (18 pages, 9 figures)

  22. arXiv:2407.04280  [pdf, other

    cs.CL cs.SD eess.AS

    LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech

    Authors: Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim

    Abstract: Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis revea… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted for INTERSPEECH 2024

  23. arXiv:2407.03923  [pdf, other

    cs.CV cs.AI

    CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images

    Authors: Junghe Lee, Donghyeong Kim, Dogyoon Lee, Suhwan Cho, Sangyoun Lee

    Abstract: Neural radiance fields (NeRFs) have received significant attention due to their high-quality novel view rendering ability, prompting research to address various real-world cases. One critical challenge is the camera motion blur caused by camera movement during exposure time, which prevents accurate 3D scene reconstruction. In this study, we propose continuous rigid motion-aware gaussian splatting… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Project Page : https://jho-yonsei.github.io/CRiM-Gaussian/

  24. arXiv:2407.03204  [pdf, other

    cs.CV

    Expressive Gaussian Human Avatars from Monocular RGB Video

    Authors: Hezhen Hu, Zhiwen Fan, Tianhao Wu, Yihan Xi, Seoyoung Lee, Georgios Pavlakos, Zhangyang Wang

    Abstract: Nuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we focus on investigating the expressiveness of human avatars when learned from monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  25. arXiv:2407.03153  [pdf, other

    cs.LG cs.CV

    Efficient Shapley Values for Attributing Global Properties of Diffusion Models to Data Group

    Authors: Chris Lin, Mingyu Lu, Chanwoo Kim, Su-In Lee

    Abstract: As diffusion models are deployed in real-world settings, data attribution is needed to ensure fair acknowledgment for contributors of high-quality training data and to identify sources of harmful content. Previous work focuses on identifying individual training samples important for the generation of a given image. However, instead of focusing on a given generated image, some use cases require und… ▽ More

    Submitted 9 June, 2024; originally announced July 2024.

  26. arXiv:2407.03103  [pdf, other

    cs.CL

    Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

    Authors: Suyeon Lee, Sunghwan Kim, Minju Kim, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, Jinyoung Yeo

    Abstract: Recently, the demand for psychological counseling has significantly increased as more individuals express concerns about their mental health. This surge has accelerated efforts to improve the accessibility of counseling by using large language models (LLMs) as counselors. To ensure client privacy, training open-source LLMs faces a key challenge: the absence of realistic counseling datasets. To add… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Under Review

  27. arXiv:2407.03086  [pdf, other

    cs.LG cs.AI cs.DC

    Effective Heterogeneous Federated Learning via Efficient Hypernetwork-based Weight Generation

    Authors: Yujin Shin, Kichang Lee, Sungmin Lee, You Rim Choi, Hyung-Sin Kim, JeongGil Ko

    Abstract: While federated learning leverages distributed client resources, it faces challenges due to heterogeneous client capabilities. This necessitates allocating models suited to clients' resources and careful parameter aggregation to accommodate this heterogeneity. We propose HypeMeFed, a novel federated learning framework for supporting client heterogeneity by combining a multi-exit network architectu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  28. arXiv:2407.03010  [pdf, other

    cs.CV

    Context-Aware Video Instance Segmentation

    Authors: Seunghun Lee, Jiwan Seo, Kiljoon Han, Minwoo Choi, Sunghoon Im

    Abstract: In this paper, we introduce the Context-Aware Video Instance Segmentation (CAVIS), a novel framework designed to enhance instance association by integrating contextual information adjacent to each object. To efficiently extract and leverage this information, we propose the Context-Aware Instance Tracker (CAIT), which merges contextual data surrounding the instances with the core instance features… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Project page: https://seung-hun-lee.github.io/projects/CAVIS/

  29. arXiv:2407.02681  [pdf, other

    cs.LG eess.IV math.OC stat.ML

    Uniform Transformation: Refining Latent Representation in Variational Autoencoders

    Authors: Ye Shi, C. S. George Lee

    Abstract: Irregular distribution in latent space causes posterior collapse, misalignment between posterior and prior, and ill-sampling problem in Variational Autoencoders (VAEs). In this paper, we introduce a novel adaptable three-stage Uniform Transformation (UT) module -- Gaussian Kernel Density Estimation (G-KDE) clustering, non-parametric Gaussian Mixture (GM) Modeling, and Probability Integral Transfor… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 IEEE 20th International Conference on Automation Science and Engineering

  30. arXiv:2407.00355  [pdf, other

    physics.soc-ph cond-mat.stat-mech cs.SI

    Global decomposition of networks into multiple cores formed by local hubs

    Authors: Wonhee Jeong, Unjong Yu, Sang Hoon Lee

    Abstract: Networks are ubiquitous in various fields, representing systems where nodes and their interconnections constitute their intricate structures. We introduce a network decomposition scheme to reveal multiscale core-periphery structures lurking inside, using the concept of locally defined nodal hub centrality and edge-pruning techniques built upon it. We demonstrate that the hub-centrality-based edge… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 10 pages, 8 figures, 1 table

  31. arXiv:2406.18925  [pdf, other

    cs.CL cs.CV

    Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding

    Authors: Jiwan Chung, Sungjae Lee, Minseo Kim, Seungju Han, Ashkan Yousefpour, Jack Hessel, Youngjae Yu

    Abstract: Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures

  32. arXiv:2406.18568  [pdf

    cs.CV cs.AI cs.LG

    A Diagnostic Model for Acute Lymphoblastic Leukemia Using Metaheuristics and Deep Learning Methods

    Authors: M. Hosseinzadeh, P. Khoshaght, S. Sadeghi, P. Asghari, Z. Arabi, J. Lansky, P. Budinsky, A. Masoud Rahmani, S. W. Lee

    Abstract: Acute lymphoblastic leukemia (ALL) severity is determined by the presence and ratios of blast cells (abnormal white blood cells) in both bone marrow and peripheral blood. Manual diagnosis of this disease is a tedious and time-consuming operation, making it difficult for professionals to accurately examine blast cell characteristics. To address this difficulty, researchers use deep learning and mac… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  33. arXiv:2406.18138  [pdf, other

    cs.RO

    B-TMS: Bayesian Traversable Terrain Modeling and Segmentation Across 3D LiDAR Scans and Maps for Enhanced Off-Road Navigation

    Authors: Minho Oh, Gunhee Shin, Seoyeon Jang, Seungjae Lee, Dongkyu Lee, Wonho Song, Byeongho Yu, Hyungtae Lim, Jaeyoung Lee, Hyun Myung

    Abstract: Recognizing traversable terrain from 3D point cloud data is critical, as it directly impacts the performance of autonomous navigation in off-road environments. However, existing segmentation algorithms often struggle with challenges related to changes in data distribution, environmental specificity, and sensor variations. Moreover, when encountering sunken areas, their performance is frequently co… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE IV'24 workshop on Off-road autonomy

  34. arXiv:2406.17787  [pdf

    cs.CL

    Role of Dependency Distance in Text Simplification: A Human vs ChatGPT Simplification Comparison

    Authors: Sumi Lee, Gondy Leroy, David Kauchak, Melissa Just

    Abstract: This study investigates human and ChatGPT text simplification and its relationship to dependency distance. A set of 220 sentences, with increasing grammatical difficulty as measured in a prior user study, were simplified by a human expert and using ChatGPT. We found that the three sentence sets all differed in mean dependency distances: the highest in the original sentence set, followed by ChatGPT… ▽ More

    Submitted 20 May, 2024; originally announced June 2024.

  35. arXiv:2406.16275  [pdf, other

    cs.CL

    Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

    Authors: Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo

    Abstract: AI Generated Text (AIGT) detectors are developed with texts from humans and LLMs of common tasks. Despite the diversity of plausible prompt choices, these datasets are generally constructed with a limited number of prompts. The lack of prompt variation can introduce prompt-specific shortcut features that exist in data collected with the chosen prompt, but do not generalize to others. In this paper… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 13 tables, under review

  36. Deep Learning Segmentation of Ascites on Abdominal CT Scans for Automatic Volume Quantification

    Authors: Benjamin Hou, Sung-Won Lee, Jung-Min Lee, Christopher Koh, Jing Xiao, Perry J. Pickhardt, Ronald M. Summers

    Abstract: Purpose: To evaluate the performance of an automated deep learning method in detecting ascites and subsequently quantifying its volume in patients with liver cirrhosis and ovarian cancer. Materials and Methods: This retrospective study included contrast-enhanced and non-contrast abdominal-pelvic CT scans of patients with cirrhotic ascites and patients with ovarian cancer from two institutions, N… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  37. arXiv:2406.15487  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving Text-To-Audio Models with Synthetic Captions

    Authors: Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro

    Abstract: It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model}… ▽ More

    Submitted 8 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  38. Embracing Federated Learning: Enabling Weak Client Participation via Partial Model Training

    Authors: Sunwoo Lee, Tuo Zhang, Saurav Prakash, Yue Niu, Salman Avestimehr

    Abstract: In Federated Learning (FL), clients may have weak devices that cannot train the full model or even hold it in their memory space. To implement large-scale FL applications, thus, it is crucial to develop a distributed learning method that enables the participation of such weak clients. We propose EmbracingFL, a general FL framework that allows all available clients to join the distributed training… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Journal ref: IEEE Transactions on Mobile Computing, Early Access, (2024)

  39. arXiv:2406.14856  [pdf, other

    cs.CV cs.HC cs.LG

    Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis

    Authors: Md Saiful Islam, Tariq Adnan, Jan Freyberg, Sangwu Lee, Abdelrahman Abdelkader, Meghan Pawlik, Cathe Schwartz, Karen Jaffe, Ruth B. Schneider, E Ray Dorsey, Ehsan Hoque

    Abstract: Limited access to neurological care leads to missed diagnoses of Parkinson's disease (PD), leaving many individuals unidentified and untreated. We trained a novel neural network-based fusion architecture to detect Parkinson's disease (PD) by analyzing features extracted from webcam recordings of three tasks: finger tapping, facial expression (smiling), and speech (uttering a sentence containing al… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  40. arXiv:2406.14703  [pdf, other

    cs.CL cs.AI

    Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics

    Authors: Seungbeen Lee, Seungwon Lim, Seungju Han, Giyeong Oh, Hyungjoo Chae, Jiwan Chung, Minju Kim, Beong-woo Kwak, Yeonsoo Lee, Dongha Lee, Jinyoung Yeo, Youngjae Yu

    Abstract: The idea of personality in descriptive psychology, traditionally defined through observable behavior, has now been extended to Large Language Models (LLMs) to better understand their behavior. This raises a question: do LLMs exhibit distinct and consistent personality traits, similar to humans? Existing self-assessment personality tests, while applicable, lack the necessary validity and reliabilit… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint; Under review

  41. arXiv:2406.13846  [pdf, other

    cs.CL cs.LG

    Text Serialization and Their Relationship with the Conventional Paradigms of Tabular Machine Learning

    Authors: Kyoka Ono, Simon A. Lee

    Abstract: Recent research has explored how Language Models (LMs) can be used for feature representation and prediction in tabular machine learning tasks. This involves employing text serialization and supervised fine-tuning (SFT) techniques. Despite the simplicity of these techniques, significant gaps remain in our understanding of the applicability and reliability of LMs in this context. Our study assesses… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Accepted into the ICML AI4Science Workshop

  42. arXiv:2406.13502  [pdf, other

    cs.CL cs.SD eess.AS

    ManWav: The First Manchu ASR Model

    Authors: Jean Seo, Minha Kang, Sungjoo Byun, Sangah Lee

    Abstract: This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and extremely low resource languages, with a particular focus on Manchu, a critically endangered language. Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of-the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ACL2024/Field Matters

  43. arXiv:2406.12202  [pdf, other

    cs.RO

    Fast Global Localization on Neural Radiance Field

    Authors: Mangyu Kong, Seongwon Lee, Jaewon Lee, Euntai Kim

    Abstract: Neural Radiance Fields (NeRF) presented a novel way to represent scenes, allowing for high-quality 3D reconstruction from 2D images. Following its remarkable achievements, global localization within NeRF maps is an essential task for enabling a wide range of applications. Recently, Loc-NeRF demonstrated a localization approach that combines traditional Monte Carlo Localization with NeRF, showing p… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Preprint, Under review

  44. arXiv:2406.11850  [pdf, other

    cs.CY cs.AI

    Closed-loop Teaching via Demonstrations to Improve Policy Transparency

    Authors: Michael S. Lee, Reid Simmons, Henny Admoni

    Abstract: Demonstrations are a powerful way of increasing the transparency of AI policies. Though informative demonstrations may be selected a priori through the machine teaching paradigm, student learning may deviate from the preselected curriculum in situ. This paper thus explores augmenting a curriculum with a closed-loop teaching framework inspired by principles from the education literature, such as th… ▽ More

    Submitted 1 April, 2024; originally announced June 2024.

    Comments: Supplementary material available at https://drive.google.com/file/d/1f_BDk3JpY6DvqlvgKtnQZ8zdfO3XAn3p/view?usp=drive_link

  45. arXiv:2406.11384  [pdf, other

    cs.CV

    Understanding Multi-Granularity for Open-Vocabulary Part Segmentation

    Authors: Jiho Choi, Seonho Lee, Seungho Lee, Minhyun Lee, Hyunjung Shim

    Abstract: Open-vocabulary part segmentation (OVPS) is an emerging research area focused on segmenting fine-grained entities based on diverse and previously unseen vocabularies. Our study highlights the inherent complexities of part segmentation due to intricate boundaries and diverse granularity, reflecting the knowledge-based nature of part identification. To address these challenges, we propose PartCLIPSe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  46. Expanding the Design Space of Computer Vision-based Interactive Systems for Group Dance Practice

    Authors: Soohwan Lee, Seoyeong Hwang, Ian Oakley, Kyungho Lee

    Abstract: Group dance, a sub-genre characterized by intricate motions made by a cohort of performers in tight synchronization, has a longstanding and culturally significant history and, in modern forms such as cheerleading, a broad base of current adherents. However, despite its popularity, learning group dance routines remains challenging. Based on the prior success of interactive systems to support indivi… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 20 pages, 10 figures, 1 table, to be published in the proceedings of the ACM Designing Interactive Systems Conference, 2024, (DIS '24)

    Journal ref: ACM Designing Interactive Systems Conference, 2024, (DIS '24)

  47. arXiv:2406.11125  [pdf, other

    cs.HC

    Conversational Agents as Catalysts for Critical Thinking: Challenging Design Fixation in Group Design

    Authors: Soohwan Lee, Seoyeong Hwang, Kyungho Lee

    Abstract: This paper investigates the potential of LLM-based conversational agents (CAs) to enhance critical reflection and mitigate design fixation in group design work. By challenging AI-generated recommendations and prevailing group opinions, these agents address issues such as groupthink and promote a more dynamic and inclusive design process. Key design considerations include optimizing intervention ti… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 7 pages, 2 figures, DIS2024 Workshop on 'Death of Design Researcher'

  48. arXiv:2406.11016  [pdf, other

    cs.LG cs.CL

    Optimized Speculative Sampling for GPU Hardware Accelerators

    Authors: Dominik Wagner, Seanie Lee, Ilja Baumann, Philipp Seeberger, Korbinian Riedhammer, Tobias Bocklet

    Abstract: In this work, we optimize speculative sampling for parallel hardware accelerators to improve sampling speed. We notice that substantial portions of the intermediate matrices necessary for speculative sampling can be computed concurrently. This allows us to distribute the workload across multiple GPU threads, enabling simultaneous operations on matrix segments within thread blocks. Additionally, we… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  49. arXiv:2406.09799  [pdf, other

    cs.CY

    GeoSEE: Regional Socio-Economic Estimation With a Large Language Model

    Authors: Sungwon Han, Donghyun Ahn, Seungeon Lee, Minhyuk Song, Sungwon Park, Sangyoon Park, Jihee Kim, Meeyoung Cha

    Abstract: Moving beyond traditional surveys, combining heterogeneous data sources with AI-driven inference models brings new opportunities to measure socio-economic conditions, such as poverty and population, over expansive geographic areas. The current research presents GeoSEE, a method that can estimate various socio-economic indicators using a unified pipeline powered by a large language model (LLM). Pre… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  50. arXiv:2406.09388  [pdf, other

    cs.CV cs.AI cs.LG

    Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition

    Authors: Youngtaek Oh, Pyunghwan Ahn, Jinhyung Kim, Gwangmo Song, Soonyoung Lee, In So Kweon, Junmo Kim

    Abstract: Vision and language models (VLMs) such as CLIP have showcased remarkable zero-shot recognition abilities yet face challenges in visio-linguistic compositionality, particularly in linguistic comprehension and fine-grained image-text alignment. This paper explores the intricate relationship between compositionality and recognition -- two pivotal aspects of VLM capability. We conduct a comprehensive… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPRW 2024 on 'What is Next in Multimodal Foundation Models?'. Code: https://github.com/ytaek-oh/vl_compo