Skip to main content

Showing 1–50 of 2,543 results for author: Kim, J

  1. arXiv:2407.13676  [pdf, other

    cs.MM cs.CV cs.SD eess.AS

    Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

    Authors: Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

    Abstract: Recent studies on learning-based sound source localization have mainly focused on the localization performance perspective. However, prior work and existing benchmarks overlook a crucial aspect: cross-modal interaction, which is essential for interactive sound source localization. Cross-modal interaction is vital for understanding semantically matched or mismatched audio-visual events, such as sil… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Journal Extension of ICCV 2023 paper (arXiV:2309.10724). Code is available at https://github.com/kaistmm/SSLalignment

  2. arXiv:2407.13524  [pdf, other

    cs.CV cs.AI

    Enhancing Source-Free Domain Adaptive Object Detection with Low-confidence Pseudo Label Distillation

    Authors: Ilhoon Yoon, Hyeongjun Kwon, Jin Kim, Junyoung Park, Hyunsung Jang, Kwanghoon Sohn

    Abstract: Source-Free domain adaptive Object Detection (SFOD) is a promising strategy for deploying trained detectors to new, unlabeled domains without accessing source data, addressing significant concerns around data privacy and efficiency. Most SFOD methods leverage a Mean-Teacher (MT) self-training paradigm relying heavily on High-confidence Pseudo Labels (HPL). However, these HPL often overlook small i… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  3. arXiv:2407.13517  [pdf, other

    cs.CV

    Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks

    Authors: Sehwan Choi, Jungho Kim, Hongjae Shin, Jun Won Choi

    Abstract: In this paper, we introduce Mask2Map, a novel end-to-end online HD map construction method designed for autonomous driving applications. Our approach focuses on predicting the class and ordered point set of map instances within a scene, represented in the bird's eye view (BEV). Mask2Map consists of two primary components: the Instance-Level Mask Prediction Network (IMPNet) and the Mask-Driven Map… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 20 pages, 9 figures

  4. arXiv:2407.13515  [pdf, other

    cs.HC

    CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low Vision

    Authors: Jaewook Lee, Andrew D. Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, Yuhang Zhao

    Abstract: Cooking is a central activity of daily living, supporting independence and both mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with cooking tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV) and robotics, we present CookAR, a head-mounted AR sys… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  5. arXiv:2407.13427  [pdf, other

    cs.CE cs.AI

    DeepClair: Utilizing Market Forecasts for Effective Portfolio Selection

    Authors: Donghee Choi, Jinkyu Kim, Mogan Gim, Jinho Lee, Jaewoo Kang

    Abstract: Utilizing market forecasts is pivotal in optimizing portfolio selection strategies. We introduce DeepClair, a novel framework for portfolio selection. DeepClair leverages a transformer-based time-series forecasting model to predict market trends, facilitating more informed and adaptable portfolio decisions. To integrate the forecasting model into a deep reinforcement learning-driven portfolio sele… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: CIKM 2024 Accepted

  6. arXiv:2407.13166  [pdf, other

    cs.HC cs.IR

    Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction

    Authors: Hyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim

    Abstract: With large language models (LLMs), conversational search engines shift how users retrieve information from the web by enabling natural conversations to express their search intents over multiple turns. Users' natural conversation embodies rich but implicit signals of users' search intents and evaluation of search results to understand user experience with the system. However, it is underexplored h… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to LLM4Eval @ SIGIR 2024 - The First Workshop on Large Language Models (LLMs) for Evaluation in Information Retrieval

  7. arXiv:2407.13055  [pdf, other

    cs.CR cs.PF

    Cheddar: A Swift Fully Homomorphic Encryption Library for CUDA GPUs

    Authors: Jongmin Kim, Wonseok Choi, Jung Ho Ahn

    Abstract: Fully homomorphic encryption (FHE) is a cryptographic technology capable of resolving security and privacy problems in cloud computing by encrypting data in use. However, FHE introduces tremendous computational overhead for processing encrypted data, causing FHE workloads to become 2-6 orders of magnitude slower than their unencrypted counterparts. To mitigate the overhead, we propose Cheddar, an… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 12 pages, 5 figures

  8. arXiv:2407.12998  [pdf, other

    cs.RO

    Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks

    Authors: Ji Woong Kim, Tony Z. Zhao, Samuel Schmidgall, Anton Deguet, Marin Kobilarov, Chelsea Finn, Axel Krieger

    Abstract: We explore whether surgical manipulation tasks can be learned on the da Vinci robot via imitation learning. However, the da Vinci system presents unique challenges which hinder straight-forward implementation of imitation learning. Notably, its forward kinematics is inconsistent due to imprecise joint measurements, and naively training a policy using such approximate kinematics data often leads to… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages

  9. arXiv:2407.12987  [pdf, other

    cs.CV

    ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos

    Authors: Hyolim Kang, Jeongseok Hyun, Joungbin An, Youngjae Yu, Seon Joo Kim

    Abstract: Online Temporal Action Localization (On-TAL) is a critical task that aims to instantaneously identify action instances in untrimmed streaming videos as soon as an action concludes -- a major leap from frame-based Online Action Detection (OAD). Yet, the challenge of detecting overlapping actions is often overlooked even though it is a common scenario in streaming videos. Current methods that can ad… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  10. arXiv:2407.12345  [pdf, other

    cs.CV

    VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions

    Authors: Seokha Moon, Hyun Woo, Hongbeen Park, Haeji Jung, Reza Mahjourian, Hyung-gun Chi, Hyerin Lim, Sangpil Kim, Jinkyu Kim

    Abstract: Predicting future trajectories for other road agents is an essential task for autonomous vehicles. Established trajectory prediction methods primarily use agent tracks generated by a detection and tracking system and HD map as inputs. In this work, we propose a novel method that also incorporates visual input from surround-view cameras, allowing the model to utilize visual cues such as human gazes… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  11. arXiv:2407.12173  [pdf, other

    cs.CV cs.AI

    Beta Sampling is All You Need: Efficient Image Generation Strategy for Diffusion Models using Stepwise Spectral Analysis

    Authors: Haeil Lee, Hansang Lee, Seoyeon Gye, Junmo Kim

    Abstract: Generative diffusion models have emerged as a powerful tool for high-quality image synthesis, yet their iterative nature demands significant computational resources. This paper proposes an efficient time step sampling method based on an image spectral analysis of the diffusion process, aimed at optimizing the denoising process. Instead of the traditional uniform distribution-based time step sampli… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  12. arXiv:2407.12011  [pdf, other

    cs.DC cs.AI cs.NI

    Digital Twinning of a Pressurized Water Reactor Startup Operation and Partial Computational Offloading in In-network Computing-Assisted Multiaccess Edge Computing

    Authors: Ibrahim Aliyu, Awwal M. Arigi, Tai-Won Um, Jinsul Kim

    Abstract: This paper addresses the challenge of representing complex human action (HA) in a nuclear power plant (NPP) digital twin (DT) and minimizing latency in partial computation offloading (PCO) in sixth-generation-enabled computing in the network (COIN) assisted multiaccess edge computing (MEC). Accurate HA representation in the DT-HA model is vital for modeling human interventions that are crucial for… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

  13. arXiv:2407.11962  [pdf, other

    cs.CV cs.AI cs.LG

    Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling

    Authors: Jaehyeok Kim, Dongyoon Wee, Dan Xu

    Abstract: This paper introduces Motion-oriented Compositional Neural Radiance Fields (MoCo-NeRF), a framework designed to perform free-viewpoint rendering of monocular human videos via novel non-rigid motion modeling approach. In the context of dynamic clothed humans, complex cloth dynamics generate non-rigid motions that are intrinsically distinct from skeletal articulations and critically important for th… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  14. arXiv:2407.11793  [pdf, other

    cs.CV cs.AI cs.GR

    Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

    Authors: Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, Hoseok Do

    Abstract: Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D s… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The first two authors contributed equally to this work

  15. arXiv:2407.11534  [pdf, other

    cs.LG cs.AI

    LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

    Authors: Jung Hyun Lee, Jeonghoon Kim, June Yong Yang, Se Jung Kwon, Eunho Yang, Kang Min Yoo, Dongsoo Lee

    Abstract: With the commercialization of large language models (LLMs), weight-activation quantization has emerged to compress and accelerate LLMs, achieving high throughput while reducing inference costs. However, existing post-training quantization (PTQ) techniques for quantizing weights and activations of LLMs still suffer from non-negligible accuracy drops, especially on massive multitask language underst… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Preprint

  16. arXiv:2407.11394  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation

    Authors: Jiwook Kim, Seonho Lee, Jaeyo Shin, Jiho Choi, Hyunjung Shim

    Abstract: Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks due to its inherent 3D consistency. However, existing SDS-based 3D editing methods suffer from extensive training time and lead to low-quality results, primarily because these methods deviate from the sampling dynamics of diffusion models. In this paper, we propose DreamCatalyst, a novel framewo… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  17. arXiv:2407.11261  [pdf, other

    physics.soc-ph cs.SI math.DS nlin.AO

    Competition between group interactions and nonlinearity in voter dynamics on hypergraphs

    Authors: Jihye Kim, Deok-Sun Lee, Byungjoon Min, Mason A. Porter, Maxi San Miguel, K. -I. Goh

    Abstract: Social dynamics are often driven by both pairwise (i.e., dyadic) relationships and higher-order (i.e., polyadic) group relationships, which one can describe using hypergraphs. To gain insight into the impact of polyadic relationships on dynamical processes on networks, we formulate and study a polyadic voter process, which we call the group-driven voter model (GVM), in which we incorporate the eff… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 6 pages, 5 figures

  18. arXiv:2407.10910  [pdf, other

    cs.CV cs.LG

    DataDream: Few-shot Guided Dataset Generation

    Authors: Jae Myung Kim, Jessica Bader, Stephan Alaniz, Cordelia Schmid, Zeynep Akata

    Abstract: While text-to-image diffusion models have been shown to achieve state-of-the-art results in image synthesis, they have yet to prove their effectiveness in downstream applications. Previous work has proposed to generate data for image classifier training given limited real data access. However, these methods struggle to generate in-distribution images or depict fine-grained features, thereby hinder… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  19. arXiv:2407.10733  [pdf, other

    cs.CV

    Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture

    Authors: Dong-Hee Kim, Sungduk Cho, Hyeonwoo Cho, Chanmin Park, Jinyoung Kim, Won Hwa Kim

    Abstract: In this work, we introduce Mask-JEPA, a self-supervised learning framework tailored for mask classification architectures (MCA), to overcome the traditional constraints associated with training segmentation models. Mask-JEPA combines a Joint Embedding Predictive Architecture with MCA to adeptly capture intricate semantics and precise object boundaries. Our approach addresses two critical challenge… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 27 pages, 5 figures

  20. arXiv:2407.10206  [pdf

    cs.CE cs.AI cs.NE cs.SI

    Dominant Design Prediction with Phylogenetic Networks

    Authors: Youwei He, Jeong-Dong Lee, Dawoon Jeong, Sungjun Choi, Jiyong Kim

    Abstract: This study proposes an effective method to predict technology development from an evolutionary perspective. Product evolution is the result of technological evolution and market selection. A phylogenetic network is the main method to study product evolution. The formation of the dominant design determines the trajectory of technology development. How to predict future dominant design has become a… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  21. arXiv:2407.10091  [pdf, other

    cs.CL

    Enhancing Emotion Prediction in News Headlines: Insights from ChatGPT and Seq2Seq Models for Free-Text Generation

    Authors: Ge Gao, Jongin Kim, Sejin Paik, Ekaterina Novozhilova, Yi Liu, Sarah T. Bonna, Margrit Betke, Derry Tanti Wijaya

    Abstract: Predicting emotions elicited by news headlines can be challenging as the task is largely influenced by the varying nature of people's interpretations and backgrounds. Previous works have explored classifying discrete emotions directly from news headlines. We provide a different approach to tackling this problem by utilizing people's explanations of their emotion, written in free-text, on how they… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: published at LREC-COLING 2024

    ACM Class: I.2.7

    Journal ref: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) 5944-5955

  22. arXiv:2407.09514  [pdf

    cond-mat.mtrl-sci cs.LG physics.app-ph

    Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks

    Authors: Seunghee Han, Byeong Gwan Lee, Dae Woon Lim, Jihan Kim

    Abstract: Recently, metal-organic frameworks (MOFs) have demonstrated their potential as solid-state electrolytes in proton exchange membrane fuel cells. However, the number of MOFs reported to exhibit proton conductivity remains limited, and the mechanisms underlying this phenomenon are not fully elucidated, complicating the design of proton-conductive MOFs. In response, we developed a comprehensive databa… ▽ More

    Submitted 17 July, 2024; v1 submitted 18 June, 2024; originally announced July 2024.

  23. arXiv:2407.09342  [pdf, other

    cs.RO

    MIXED-SENSE: A Mixed Reality Sensor Emulation Framework for Test and Evaluation of UAVs Against False Data Injection Attacks

    Authors: Kartik A. Pant, Li-Yu Lin, Jaehyeok Kim, Worawis Sribunma, James M. Goppert, Inseok Hwang

    Abstract: We present a high-fidelity Mixed Reality sensor emulation framework for testing and evaluating the resilience of Unmanned Aerial Vehicles (UAVs) against false data injection (FDI) attacks. The proposed approach can be utilized to assess the impact of FDI attacks, benchmark attack detector performance, and validate the effectiveness of mitigation/reconfiguration strategies in single-UAV and UAV swa… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 6 pages, 5 figures, IROS 2024

  24. arXiv:2407.09303  [pdf, other

    cs.CV

    ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion

    Authors: Sungmin Woo, Wonjoon Lee, Woo Jin Kim, Dogyoon Lee, Sangyoun Lee

    Abstract: Self-supervised multi-frame monocular depth estimation relies on the geometric consistency between successive frames under the assumption of a static scene. However, the presence of moving objects in dynamic scenes introduces inevitable inconsistencies, causing misaligned multi-frame feature matching and misleading self-supervision during training. In this paper, we propose a novel framework calle… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024. Project Page: https://sungmin-woo.github.io/prodepth/

  25. arXiv:2407.09184  [pdf, other

    cs.CL

    Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers

    Authors: Jong Myoung Kim, Young-Jun Lee, Yong-jin Han, Sangkeun Jung, Ho-Jin Choi

    Abstract: Syntactic elements, such as word order and case markers, are fundamental in natural language processing. Recent studies show that syntactic information boosts language model performance and offers clues for people to understand their learning mechanisms. Unlike languages with a fixed word order such as English, Korean allows for varied word sequences, despite its canonical structure, due to case m… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: COLM 2024; Code and dataset is available in https://github.com/grayapple-git/SIKO

  26. arXiv:2407.09012  [pdf, other

    cs.CV cs.AI

    TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models

    Authors: Jeongho Kim, Min-Jung Kim, Junsoo Lee, Jaegul Choo

    Abstract: Pose-driven human-image animation diffusion models have shown remarkable capabilities in realistic human video synthesis. Despite the promising results achieved by previous approaches, challenges persist in achieving temporally consistent animation and ensuring robustness with off-the-shelf pose detectors. In this paper, we present TCAN, a pose-driven human image animation method that is robust to… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally

  27. arXiv:2407.08947  [pdf, other

    cs.LG cs.CV

    Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort

    Authors: Jeeyung Kim, Ze Wang, Qiang Qiu

    Abstract: Enhancing model interpretability can address spurious correlations by revealing how models draw their predictions. Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts, albeit at a high cost of human efforts in data annotation. In this paper, we leverage a synergy of multiple foundation models to construct CBM… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  28. arXiv:2407.07995  [pdf, other

    cs.CV

    Flow4D: Leveraging 4D Voxel Network for LiDAR Scene Flow Estimation

    Authors: Jaeyeul Kim, Jungwan Woo, Ukcheol Shin, Jean Oh, Sunghoon Im

    Abstract: Understanding the motion states of the surrounding environment is critical for safe autonomous driving. These motion states can be accurately derived from scene flow, which captures the three-dimensional motion field of points. Existing LiDAR scene flow methods extract spatial features from each point cloud and then fuse them channel-wise, resulting in the implicit extraction of spatio-temporal fe… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  29. arXiv:2407.07801  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning

    Authors: Jongsuk Kim, Jiwon Shin, Junmo Kim

    Abstract: In recent years, advancements in representation learning and language models have propelled Automated Captioning (AC) to new heights, enabling the generation of human-level descriptions. Leveraging these advancements, we propose AVCap, an Audio-Visual Captioning framework, a simple yet powerful baseline approach applicable to audio-visual captioning. AVCap utilizes audio-visual features as text to… ▽ More

    Submitted 10 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Interspeech 2024

  30. arXiv:2407.07413  [pdf, other

    cs.CL

    KpopMT: Translation Dataset with Terminology for Kpop Fandom

    Authors: JiWoo Kim, Yunsu Kim, JinYeong Bak

    Abstract: While machines learn from existing corpora, humans have the unique capability to establish and accept new language systems. This makes human form unique language systems within social groups. Aligning with this, we focus on a gap remaining in addressing translation challenges within social groups, where in-group members utilize unique terminologies. We propose KpopMT dataset, which aims to fill th… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: accepted to LoresMT 2024

  31. arXiv:2407.07024  [pdf, other

    cs.CV cs.AI

    Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization

    Authors: Jeongseok Hyun, Su Ho Han, Hyolim Kang, Joon-Young Lee, Seon Joo Kim

    Abstract: The vocabulary size in temporal action localization (TAL) is constrained by the scarcity of large-scale annotated datasets. To address this, recent works incorporate powerful pre-trained vision-language models (VLMs), such as CLIP, to perform open-vocabulary TAL (OV-TAL). However, unlike VLMs trained on extensive image/video-text pairs, existing OV-TAL methods still rely on small, fully labeled TA… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  32. arXiv:2407.06851  [pdf, other

    cs.CL

    Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders

    Authors: Jinseok Kim, Jaewon Jung, Sangyeop Kim, Sohyung Park, Sungzoon Cho

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) in various tasks, their vulnerability to unsafe prompts remains a critical issue. These prompts can lead LLMs to generate responses on illegal or sensitive topics, posing a significant threat to their safe and ethical use. Existing approaches attempt to address this issue using classification models, but they have several drawback… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ACL 2024 KnowledgeableLMs workshop paper

  33. arXiv:2407.06716  [pdf, other

    cs.IR

    Analyzing the Effectiveness of Listwise Reranking with Positional Invariance on Temporal Generalizability

    Authors: Soyoung Yoon, Jongyoon Kim, Seung-won Hwang

    Abstract: Benchmarking the performance of information retrieval (IR) methods are mostly conducted within a fixed set of documents (static corpora). However, in real-world web search engine environments, the document set is continuously updated and expanded. Addressing these discrepancies and measuring the temporal persistence of IR systems is crucial. By investigating the LongEval benchmark, specifically de… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted at CLEF 2024 LongEval track

  34. arXiv:2407.06004  [pdf, other

    cs.CL

    Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models

    Authors: Chani Jung, Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim

    Abstract: While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors -- perception inference and perception-to-belief inference -- in LLMs. We introduce… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  35. arXiv:2407.05683  [pdf, other

    eess.IV cs.AI cs.CV

    RadiomicsFill-Mammo: Synthetic Mammogram Mass Manipulation with Radiomics Features

    Authors: Inye Na, Jonghun Kim, Eun Sook Ko, Hyunjin Park

    Abstract: Motivated by the question, "Can we generate tumors with desired attributes?'' this study leverages radiomics features to explore the feasibility of generating synthetic tumor images. Characterized by its low-dimensional yet biologically meaningful markers, radiomics bridges the gap between complex medical imaging data and actionable clinical insights. We present RadiomicsFill-Mammo, the first of t… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024

  36. arXiv:2407.05526  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Can Machines Learn the True Probabilities?

    Authors: Jinsook Kim

    Abstract: When there exists uncertainty, AI machines are designed to make decisions so as to reach the best expected outcomes. Expectations are based on true facts about the objective environment the machines interact with, and those facts can be encoded into AI models in the form of true objective probability functions. Accordingly, AI models involve probabilistic machine learning in which the probabilitie… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  37. arXiv:2407.05520  [pdf, ps, other

    cs.LG stat.ML

    A Theory of Machine Learning

    Authors: Jinsook Kim, Jinho Kang

    Abstract: We critically review three major theories of machine learning and provide a new theory according to which machines learn a function when the machines successfully compute it. We show that this theory challenges common assumptions in the statistical and the computational learning theories, for it implies that learning true probabilities is equivalent neither to obtaining a correct calculation of th… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  38. arXiv:2407.05271  [pdf, other

    cs.CL

    Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions

    Authors: Zhiwen You, HaeJin Lee, Shubhanshu Mishra, Sullam Jeoung, Apratim Mishra, Jinseok Kim, Jana Diesner

    Abstract: Name-based gender prediction has traditionally categorized individuals as either female or male based on their names, using a binary classification system. That binary approach can be problematic in the cases of gender-neutral names that do not align with any one gender, among other reasons. Relying solely on binary gender categories without recognizing gender-neutral names can reduce the inclusiv… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024, GeBNLP Workshop

  39. arXiv:2407.04597  [pdf, other

    cs.CV cs.AI

    Feature Attenuation of Defective Representation Can Resolve Incomplete Masking on Anomaly Detection

    Authors: YeongHyeon Park, Sungho Kang, Myung Jin Kim, Hyeong Seok Kim, Juneho Yi

    Abstract: In unsupervised anomaly detection (UAD) research, while state-of-the-art models have reached a saturation point with extensive studies on public benchmark datasets, they adopt large-scale tailor-made neural networks (NN) for detection performance or pursued unified models for various tasks. Towards edge computing, it is necessary to develop a computationally efficient and scalable solution that av… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures, 5 tables

  40. arXiv:2407.04280  [pdf, other

    cs.CL cs.SD eess.AS

    LearnerVoice: A Dataset of Non-Native English Learners' Spontaneous Speech

    Authors: Haechan Kim, Junho Myung, Seoyoung Kim, Sungpah Lee, Dongyeop Kang, Juho Kim

    Abstract: Prevalent ungrammatical expressions and disfluencies in spontaneous speech from second language (L2) learners pose unique challenges to Automatic Speech Recognition (ASR) systems. However, few datasets are tailored to L2 learner speech. We publicly release LearnerVoice, a dataset consisting of 50.04 hours of audio and transcriptions of L2 learners' spontaneous speech. Our linguistic analysis revea… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted for INTERSPEECH 2024

  41. arXiv:2407.04190  [pdf, other

    cs.CV

    Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset

    Authors: Rahm Ranjan, David Ahmedt-Aristizabal, Mohammad Ali Armin, Juno Kim

    Abstract: Clinical gait analysis (CGA) using computer vision is an emerging field in artificial intelligence that faces barriers of accessible, real-world data, and clear task objectives. This paper lays the foundation for current developments in CGA as well as vision-based methods and datasets suitable for gait analysis. We introduce The Gait Abnormality in Video Dataset (GAVD) in response to our review of… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    ACM Class: I.2.10

  42. RISC-V R-Extension: Advancing Efficiency with Rented-Pipeline for Edge DNN Processing

    Authors: Won Hyeok Kim, Hyeong Jin Kim, Tae Hee Han

    Abstract: The proliferation of edge devices necessitates efficient computational architectures for lightweight tasks, particularly deep neural network (DNN) inference. Traditional NPUs, though effective for such operations, face challenges in power, cost, and area when integrated into lightweight edge devices. The RISC-V architecture, known for its modularity and open-source nature, offers a viable alternat… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 pages, 6 figures, ICAIIC 2024

  43. arXiv:2407.01540  [pdf, other

    cs.NI

    Towards a Partial Computation offloading in In-networking Computing-Assisted MEC: A Digital Twin Approach

    Authors: Ibrahim Aliyu, Awwal Arigi, Seungmin Oh, Tai-Won Um, Jinsul Kim

    Abstract: This paper addresses the problem of minimizing latency with partial computation offloading within Industrial Internet-of-Things (IoT) systems in in-network computing (COIN)-assisted Multiaccess Edge Computing (C-MEC) via ultra-reliable and low latency communications (URLLC) links. We propose a digital twin (DT) scheme for a multiuser scenario, allowing collaborative partial task offloading from us… ▽ More

    Submitted 8 April, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  44. arXiv:2407.01214  [pdf, other

    cs.LG cs.AI

    Revisiting Random Walks for Learning on Graphs

    Authors: Jinwoo Kim, Olga Zaghen, Ayhan Suleymanzade, Youngmin Ryou, Seunghoon Hong

    Abstract: We revisit a simple idea for machine learning on graphs, where a random walk on a graph produces a machine-readable record, and this record is processed by a deep neural network to directly make vertex-level or graph-level predictions. We refer to these stochastic machines as random walk neural networks, and show that we can design them to be isomorphism invariant while capable of universal approx… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 41 pages, 11 figures

  45. arXiv:2407.01012  [pdf, other

    cs.LG cs.CV

    Swish-T : Enhancing Swish Activation with Tanh Bias for Improved Neural Network Performance

    Authors: Youngmin Seo, Jinha Kim, Unsang Park

    Abstract: We propose the Swish-T family, an enhancement of the existing non-monotonic activation function Swish. Swish-T is defined by adding a Tanh bias to the original Swish function. This modification creates a family of Swish-T variants, each designed to excel in different tasks, showcasing specific advantages depending on the application context. The Tanh bias allows for broader acceptance of negative… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures Revised the derivative of the sigmoid function from 1-sigmoid to sigmoid(1-sigmoid) for correctness.Updated related equations in Section 3.2. Conclusions to Conclusion in Section 6

    MSC Class: 68T05; 68T07; 68T10 ACM Class: I.2; I.5; I.4

  46. arXiv:2407.00264  [pdf, other

    cs.AI cs.LG

    External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling

    Authors: Rishav Bhagat, Jonathan Balloch, Zhiyu Lin, Julia Kim, Mark Riedl

    Abstract: Unlike reinforcement learning (RL) agents, humans remain capable multitaskers in changing environments. In spite of only experiencing the world through their own observations and interactions, people know how to balance focusing on tasks with learning about how changes may affect their understanding of the world. This is possible by choosing to solve tasks in ways that are interesting and generall… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  47. arXiv:2406.19135  [pdf, other

    eess.AS cs.AI

    DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

    Authors: Hyun Joon Park, Jin Sob Kim, Wooseok Shin, Sung Won Han

    Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Preprint

  48. arXiv:2406.18695  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Correct for QA Reasoning with Black-box LLMs

    Authors: Jaehyung Kim, Dongyoung Kim, Yiming Yang

    Abstract: An open challenge in recent machine learning is about how to improve the reasoning capability of large language models (LLMs) in a black-box setting, i.e., without access to detailed information such as output token probabilities. Existing approaches either rely on accessibility (which is often unrealistic) or involve significantly increased train- and inference-time costs. This paper addresses th… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: preprint, 18 pages

  49. arXiv:2406.18678  [pdf, other

    cs.LG cs.AI cs.CL

    Few-shot Personalization of LLMs with Mis-aligned Responses

    Authors: Jaehyung Kim, Yiming Yang

    Abstract: As the diversity of users increases, the capability of providing personalized responses by large language models (LLMs) has become increasingly important. Existing approaches have only limited successes in LLM personalization, due to the absence of personalized learning or the reliance on shared personal data. This paper proposes a new approach for a few-shot personalization of LLMs with their mis… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: preprint, 30 pages

  50. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures