subscribe to arXiv mailings

arXiv:2407.05405 [pdf, other]

Research on the Acoustic Emission Source Localization Methodology in Composite Materials based on Artificial Intelligence

Authors: Jongick Won, Hyuntaik Oh, Jae Sakong

Abstract: In this study, methodology of acoustic emission source localization in composite materials based on artificial intelligence was presented. Carbon fiber reinforced plastic was selected for specimen, and acoustic emission signal were measured using piezoelectric devices. The measured signal was wavelet-transformed to obtain scalograms, which were used as training data for the artificial intelligence… ▽ More In this study, methodology of acoustic emission source localization in composite materials based on artificial intelligence was presented. Carbon fiber reinforced plastic was selected for specimen, and acoustic emission signal were measured using piezoelectric devices. The measured signal was wavelet-transformed to obtain scalograms, which were used as training data for the artificial intelligence model. AESLNet(acoustic emission source localization network), proposed in this study, was constructed convolutional layers in parallel due to anisotropy of the composited materials. It is regression model to detect the coordinates of acoustic emission source location. Hyper-parameter of network has been optimized by Bayesian optimization. It has been confirmed that network can detect location of acoustic emission source with an average error of 3.02mm and a resolution of 20mm. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.01645 [pdf, other]

Sign Gradient Descent-based Neuronal Dynamics: ANN-to-SNN Conversion Beyond ReLU Network

Authors: Hyunseok Oh, Youngki Lee

Abstract: Spiking neural network (SNN) is studied in multidisciplinary domains to (i) enable order-of-magnitudes energy-efficient AI inference and (ii) computationally simulate neuro-scientific mechanisms. The lack of discrete theory obstructs the practical application of SNN by limiting its performance and nonlinearity support. We present a new optimization-theoretic perspective of the discrete dynamics of… ▽ More Spiking neural network (SNN) is studied in multidisciplinary domains to (i) enable order-of-magnitudes energy-efficient AI inference and (ii) computationally simulate neuro-scientific mechanisms. The lack of discrete theory obstructs the practical application of SNN by limiting its performance and nonlinearity support. We present a new optimization-theoretic perspective of the discrete dynamics of spiking neurons. We prove that a discrete dynamical system of simple integrate-and-fire models approximates the sub-gradient method over unconstrained optimization problems. We practically extend our theory to introduce a novel sign gradient descent (signGD)-based neuronal dynamics that can (i) approximate diverse nonlinearities beyond ReLU and (ii) advance ANN-to-SNN conversion performance in low time steps. Experiments on large-scale datasets show that our technique achieves (i) state-of-the-art performance in ANN-to-SNN conversion and (ii) is the first to convert new DNN architectures, e.g., ConvNext, MLP-Mixer, and ResMLP. We publicly share our source code at https://github.com/snuhcs/snn_signgd . △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 37 pages, 41 figures, to be published as an ICML 2024 paper

arXiv:2407.00888 [pdf, other]

Papez: Resource-Efficient Speech Separation with Auditory Working Memory

Authors: Hyunseok Oh, Juheon Yi, Youngki Lee

Abstract: Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk… ▽ More Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk Transformer with small-sized auditory working memory. Second, we adaptively prune the input tokens that do not need further processing. Finally, we reduce the number of parameters through the recurrent transformer. Our extensive evaluation shows that Papez achieves the best resource and accuracy tradeoffs with a large margin. We publicly share our source code at \texttt{https://github.com/snuhcs/Papez} △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 5 pages. Accepted by ICASSP 2023

arXiv:2406.09611 [pdf, other]

Recy-ctronics: Designing Fully Recyclable Electronics With Varied Form Factors

Authors: Tingyu Cheng, Zhihan Zhang, Han Huang, Yingting Gao, Wei Sun, Gregory D. Abowd, HyunJoo Oh, Josiah Hester

Abstract: For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materi… ▽ More For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materials-specifically, polyvinyl alcohol (PVA) and liquid metal (LM)-alongside accessible manufacturing techniques to produce electronic components and systems with versatile form factors. Our work centers on the development of recyclable electronics through three methods: 1) creating sheet electronics by screen printing LM traces on PVA substrates; 2) developing foam-based electronics by immersing mechanically stirred PVA foam into an LM solution; and 3) fabricating recyclable electronic tubes by injecting LM into mold cast PVA tubes, which can then be woven into various structures. To further assess the sustainability of our proposed methods, we conducted a life cycle assessment (LCA) to evaluate the environmental impact of our recyclable electronics in comparison to their conventional counterparts. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07803 [pdf, other]

EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech

Authors: Deok-Hyeon Cho, Hyung-Seok Oh, Seung-Bin Kim, Sang-Hoon Lee, Seong-Whan Lee

Abstract: Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressi… ▽ More Despite rapid advances in the field of emotional text-to-speech (TTS), recent studies primarily focus on mimicking the average style of a particular emotion. As a result, the ability to manipulate speech emotion remains constrained to several predefined labels, compromising the ability to reflect the nuanced variations of emotion. In this paper, we propose EmoSphere-TTS, which synthesizes expressive emotional speech by using a spherical emotion vector to control the emotional style and intensity of the synthetic speech. Without any human annotation, we use the arousal, valence, and dominance pseudo-labels to model the complex nature of emotion via a Cartesian-spherical transformation. Furthermore, we propose a dual conditional adversarial network to improve the quality of generated speech by reflecting the multi-aspect characteristics. The experimental results demonstrate the model ability to control emotional style and intensity with high-quality expressive speech. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted at INTERSPEECH 2024

arXiv:2406.06009 [pdf]

The Impact of AI on Academic Research and Publishing

Authors: Brady Lund, Manika Lamba, Sang Hoo Oh

Abstract: Generative artificial intelligence (AI) technologies like ChatGPT, have significantly impacted academic writing and publishing through their ability to generate content at levels comparable to or surpassing human writers. Through a review of recent interdisciplinary literature, this paper examines ethical considerations surrounding the integration of AI into academia, focusing on the potential for… ▽ More Generative artificial intelligence (AI) technologies like ChatGPT, have significantly impacted academic writing and publishing through their ability to generate content at levels comparable to or surpassing human writers. Through a review of recent interdisciplinary literature, this paper examines ethical considerations surrounding the integration of AI into academia, focusing on the potential for this technology to be used for scholarly misconduct and necessary oversight when using it for writing, editing, and reviewing of scholarly papers. The findings highlight the need for collaborative approaches to AI usage among publishers, editors, reviewers, and authors to ensure that this technology is used ethically and productively. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05761 [pdf, other]

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Work in Progress

arXiv:2405.17959 [pdf, other]

Attention-based sequential recommendation system using multimodal data

Authors: Hyungtaik Oh, Wonkeun Jo, Dongil Kim

Abstract: Sequential recommendation systems that model dynamic preferences based on a use's past behavior are crucial to e-commerce. Recent studies on these systems have considered various types of information such as images and texts. However, multimodal data have not yet been utilized directly to recommend products to users. In this study, we propose an attention-based sequential recommendation method tha… ▽ More Sequential recommendation systems that model dynamic preferences based on a use's past behavior are crucial to e-commerce. Recent studies on these systems have considered various types of information such as images and texts. However, multimodal data have not yet been utilized directly to recommend products to users. In this study, we propose an attention-based sequential recommendation method that employs multimodal data of items such as images, texts, and categories. First, we extract image and text features from pre-trained VGG and BERT and convert categories into multi-labeled forms. Subsequently, attention operations are performed independent of the item sequence and multimodal representations. Finally, the individual attention information is integrated through an attention fusion function. In addition, we apply multitask learning loss for each modality to improve the generalization performance. The experimental results obtained from the Amazon datasets show that the proposed method outperforms those of conventional sequential recommendation systems. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 18 pages, 4 figures, preprinted

ACM Class: I.2.1; I.2.4; I.2.7

arXiv:2404.07947 [pdf, other]

ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference

Authors: Hyungjun Oh, Kihong Kim, Jaemin Kim, Sungkyun Kim, Junyeol Lee, Du-seong Chang, Jiwon Seo

Abstract: This paper presents ExeGPT, a distributed system designed for constraint-aware LLM inference. ExeGPT finds and runs with an optimal execution schedule to maximize inference throughput while satisfying a given latency constraint. By leveraging the distribution of input and output sequences, it effectively allocates resources and determines optimal execution configurations, including batch sizes and… ▽ More This paper presents ExeGPT, a distributed system designed for constraint-aware LLM inference. ExeGPT finds and runs with an optimal execution schedule to maximize inference throughput while satisfying a given latency constraint. By leveraging the distribution of input and output sequences, it effectively allocates resources and determines optimal execution configurations, including batch sizes and partial tensor parallelism. We also introduce two scheduling strategies based on Round-Robin Allocation and Workload-Aware Allocation policies, suitable for different NLP workloads. We evaluate ExeGPT on six LLM instances of T5, OPT, and GPT-3 and five NLP tasks, each with four distinct latency constraints. Compared to FasterTransformer, ExeGPT achieves up to 15.2x improvements in throughput and 6x improvements in latency. Overall, ExeGPT achieves an average throughput gain of 2.9x across twenty evaluation scenarios. Moreover, when adapting to changing sequence distributions, the cost of adjusting the schedule in ExeGPT is reasonably modest. ExeGPT proves to be an effective solution for optimizing and executing LLM inference for diverse NLP workload and serving conditions. △ Less

Submitted 15 March, 2024; originally announced April 2024.

Comments: Accepted to ASPLOS 2024 (summer cycle)

Journal ref: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS 24 summer cycle), Volume 2, Nov 15, 2023 (Notification Date)

arXiv:2404.04096 [pdf, other]

Machine Learning-Aided Cooperative Localization under Dense Urban Environment

Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.14326 [pdf, other]

Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests

Authors: Haedam Oh, Nived Chebrolu, Matias Mattamala, Leonard Freißmuth, Maurice Fallon

Abstract: Many LiDAR place recognition systems have been developed and tested specifically for urban driving scenarios. Their performance in natural environments such as forests and woodlands have been studied less closely. In this paper, we analyzed the capabilities of four different LiDAR place recognition systems, both handcrafted and learning-based methods, using LiDAR data collected with a handheld dev… ▽ More Many LiDAR place recognition systems have been developed and tested specifically for urban driving scenarios. Their performance in natural environments such as forests and woodlands have been studied less closely. In this paper, we analyzed the capabilities of four different LiDAR place recognition systems, both handcrafted and learning-based methods, using LiDAR data collected with a handheld device and legged robot within dense forest environments. In particular, we focused on evaluating localization where there is significant translational and orientation difference between corresponding LiDAR scan pairs. This is particularly important for forest survey systems where the sensor or robot does not follow a defined road or path. Extending our analysis we then incorporated the best performing approach, Logg3dNet, into a full 6-DoF pose estimation system -- introducing several verification layers for precise registration. We demonstrated the performance of our methods in three operational modes: online SLAM, offline multi-mission SLAM map merging, and relocalization into a prior map. We evaluated these modes using data captured in forests from three different countries, achieving 80% of correct loop closures candidates with baseline distances up to 5m, and 60% up to 10m. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.06537 [pdf, other]

On the Consideration of AI Openness: Can Good Intent Be Abused?

Authors: Yeeun Kim, Eunkyung Choi, Hyunjun Kim, Hongseok Oh, Hyunseo Shin, Wonseok Hwang

Abstract: Openness is critical for the advancement of science. In particular, recent rapid progress in AI has been made possible only by various open-source models, datasets, and libraries. However, this openness also means that technologies can be freely used for socially harmful purposes. Can open-source models or datasets be used for malicious purposes? If so, how easy is it to adapt technology for such… ▽ More Openness is critical for the advancement of science. In particular, recent rapid progress in AI has been made possible only by various open-source models, datasets, and libraries. However, this openness also means that technologies can be freely used for socially harmful purposes. Can open-source models or datasets be used for malicious purposes? If so, how easy is it to adapt technology for such goals? Here, we conduct a case study in the legal domain, a realm where individual decisions can have profound social consequences. To this end, we build EVE, a dataset consisting of 200 examples of questions and corresponding answers about criminal activities based on 200 Korean precedents. We found that a widely accepted open-source LLM, which initially refuses to answer unethical questions, can be easily tuned with EVE to provide unethical and informative answers about criminal activities. This implies that although open-source technologies contribute to scientific progress, some care must be taken to mitigate possible malicious use cases. Warning: This paper contains contents that some may find unethical. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 10 pages

arXiv:2402.19237 [pdf, ps, other]

Context-based Interpretable Spatio-Temporal Graph Convolutional Network for Human Motion Forecasting

Authors: Edgar Medina, Leyong Loh, Namrata Gurung, Kyung Hun Oh, Niels Heller

Abstract: Human motion prediction is still an open problem extremely important for autonomous driving and safety applications. Due to the complex spatiotemporal relation of motion sequences, this remains a challenging problem not only for movement prediction but also to perform a preliminary interpretation of the joint connections. In this work, we present a Context-based Interpretable Spatio-Temporal Graph… ▽ More Human motion prediction is still an open problem extremely important for autonomous driving and safety applications. Due to the complex spatiotemporal relation of motion sequences, this remains a challenging problem not only for movement prediction but also to perform a preliminary interpretation of the joint connections. In this work, we present a Context-based Interpretable Spatio-Temporal Graph Convolutional Network (CIST-GCN), as an efficient 3D human pose forecasting model based on GCNs that encompasses specific layers, aiding model interpretability and providing information that might be useful when analyzing motion distribution and body behavior. Our architecture extracts meaningful information from pose sequences, aggregates displacements and accelerations into the input model, and finally predicts the output displacements. Extensive experiments on Human 3.6M, AMASS, 3DPW, and ExPI datasets demonstrate that CIST-GCN outperforms previous methods in human motion prediction and robustness. Since the idea of enhancing interpretability for motion prediction has its merits, we showcase experiments towards it and provide preliminary evaluations of such insights here. available code: https://github.com/QualityMinds/cistgcn △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 10 pages, 6 figures

arXiv:2402.14334 [pdf, other]

INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models

Authors: Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo

Abstract: Despite the critical need to align search targets with users' intention, retrievers often only prioritize query information without delving into the users' intended search context. Enhancing the capability of retrievers to understand intentions and preferences of users, akin to language model instructions, has the potential to yield more aligned search targets. Prior studies restrict the applicati… ▽ More Despite the critical need to align search targets with users' intention, retrievers often only prioritize query information without delving into the users' intended search context. Enhancing the capability of retrievers to understand intentions and preferences of users, akin to language model instructions, has the potential to yield more aligned search targets. Prior studies restrict the application of instructions in information retrieval to a task description format, neglecting the broader context of diverse and evolving search scenarios. Furthermore, the prevailing benchmarks utilized for evaluation lack explicit tailoring to assess instruction-following ability, thereby hindering progress in this field. In response to these limitations, we propose a novel benchmark,INSTRUCTIR, specifically designed to evaluate instruction-following ability in information retrieval tasks. Our approach focuses on user-aligned instructions tailored to each query instance, reflecting the diverse characteristics inherent in real-world search scenarios. Through experimental analysis, we observe that retrievers fine-tuned to follow task-style instructions, such as INSTRUCTOR, can underperform compared to their non-instruction-tuned counterparts. This underscores potential overfitting issues inherent in constructing retrievers trained on existing instruction-aware retrieval datasets. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.13466 [pdf, other]

doi 10.1109/LRA.2024.3366755

Leveraging Demonstrator-perceived Precision for Safe Interactive Imitation Learning of Clearance-limited Tasks

Authors: Hanbit Oh, Takamitsu Matsubara

Abstract: Interactive imitation learning is an efficient, model-free method through which a robot can learn a task by repetitively iterating an execution of a learning policy and a data collection by querying human demonstrations. However, deploying unmatured policies for clearance-limited tasks, like industrial insertion, poses significant collision risks. For such tasks, a robot should detect the collisio… ▽ More Interactive imitation learning is an efficient, model-free method through which a robot can learn a task by repetitively iterating an execution of a learning policy and a data collection by querying human demonstrations. However, deploying unmatured policies for clearance-limited tasks, like industrial insertion, poses significant collision risks. For such tasks, a robot should detect the collision risks and request intervention by ceding control to a human when collisions are imminent. The former requires an accurate model of the environment, a need that significantly limits the scope of IIL applications. In contrast, humans implicitly demonstrate environmental precision by adjusting their behavior to avoid collisions when performing tasks. Inspired by human behavior, this paper presents a novel interactive learning method that uses demonstrator-perceived precision as a criterion for human intervention called Demonstrator-perceived Precision-aware Interactive Imitation Learning (DPIIL). DPIIL captures precision by observing the speed-accuracy trade-off exhibited in human demonstrations and cedes control to a human to avoid collisions in states where high precision is estimated. DPIIL improves the safety of interactive policy learning and ensures efficiency without explicitly providing precise information of the environment. We assessed DPIIL's effectiveness through simulations and real-robot experiments that trained a UR5e 6-DOF robotic arm to perform assembly tasks. Our results significantly improved training safety, and our best performance compared favorably with other learning methods. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 8 pages, 5 figures, accepted by IEEE Robotics and Automation Letters (RA-L) 2024

arXiv:2402.00366 [pdf, other]

Legged Robot State Estimation With Invariant Extended Kalman Filter Using Neural Measurement Network

Authors: Donghoon Youm, Hyunsik Oh, Suyoung Choi, Hyeongjun Kim, Jemin Hwangbo

Abstract: This paper introduces a novel proprioceptive state estimator for legged robots that combines model-based filters and deep neural networks. Recent studies have shown that neural networks such as multi-layer perceptron or recurrent neural networks can estimate the robot states, including contact probability and linear velocity. Inspired by this, we develop a state estimation framework that integrate… ▽ More This paper introduces a novel proprioceptive state estimator for legged robots that combines model-based filters and deep neural networks. Recent studies have shown that neural networks such as multi-layer perceptron or recurrent neural networks can estimate the robot states, including contact probability and linear velocity. Inspired by this, we develop a state estimation framework that integrates a neural measurement network (NMN) with an invariant extended Kalman filter. We show that our framework improves estimation performance in various terrains. Existing studies that combine model-based filters and learning-based approaches typically use real-world data. However, our approach relies solely on simulation data, as it allows us to easily obtain extensive data. This difference leads to a gap between the learning and the inference domain, commonly referred to as a sim-to-real gap. We address this challenge by adapting existing learning techniques and regularization. To validate our proposed method, we conduct experiments using a quadruped robot on four types of terrain: \textit{flat}, \textit{debris}, \textit{soft}, and \textit{slippery}. We observe that our approach significantly reduces position drift compared to the existing model-based state estimator. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 8pages, 6paper, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2401.08095 [pdf, other]

DurFlex-EVC: Duration-Flexible Emotional Voice Conversion with Parallel Generation

Authors: Hyung-Seok Oh, Sang-Hoon Lee, Deok-Hyeon Cho, Seong-Whan Lee

Abstract: Emotional voice conversion (EVC) seeks to modify the emotional tone of a speaker's voice while preserving the original linguistic content and the speaker's unique vocal characteristics. Recent advancements in EVC have involved the simultaneous modeling of pitch and duration, utilizing the potential of sequence-to-sequence (seq2seq) models. To enhance reliability and efficiency in conversion, this… ▽ More Emotional voice conversion (EVC) seeks to modify the emotional tone of a speaker's voice while preserving the original linguistic content and the speaker's unique vocal characteristics. Recent advancements in EVC have involved the simultaneous modeling of pitch and duration, utilizing the potential of sequence-to-sequence (seq2seq) models. To enhance reliability and efficiency in conversion, this study shifts focus towards parallel speech generation. We introduce Duration-Flexible EVC (DurFlex-EVC), which integrates a style autoencoder and unit aligner. Traditional models, while incorporating self-supervised learning (SSL) representations that contain both linguistic and paralinguistic information, have neglected this dual nature, leading to reduced controllability. Addressing this issue, we implement cross-attention to synchronize these representations with various emotions. Additionally, a style autoencoder is developed for the disentanglement and manipulation of style elements. The efficacy of our approach is validated through both subjective and objective evaluations, establishing its superiority over existing models in the field. △ Less

Submitted 7 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: 13 pages, 9 figures, 8 tables

arXiv:2401.06913 [pdf, other]

Microphone Conversion: Mitigating Device Variability in Sound Event Classification

Authors: Myeonghoon Ryu, Hongseok Oh, Suji Lee, Han Park

Abstract: In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method. As SEC systems become increasingly common, it is crucial that they work well with audio from diverse recording devices. Our method addresses limited device div… ▽ More In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method. As SEC systems become increasingly common, it is crucial that they work well with audio from diverse recording devices. Our method addresses limited device diversity in training data by enabling unpaired training to transform input spectrograms as if they are recorded on a different device. Our experiments show that our approach outperforms existing methods in generalization by 5.2% - 11.5% in weighted f1 score. Additionally, it surpasses the current methods in adaptability across diverse recording devices by achieving a 6.5% - 12.8% improvement in weighted f1 score. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: Accepted to ICASSP 2024

arXiv:2312.04382 [pdf, other]

Adversarial Denoising Diffusion Model for Unsupervised Anomaly Detection

Authors: Jongmin Yu, Hyeontaek Oh, Jinhong Yang

Abstract: In this paper, we propose the Adversarial Denoising Diffusion Model (ADDM). The ADDM is based on the Denoising Diffusion Probabilistic Model (DDPM) but complementarily trained by adversarial learning. The proposed adversarial learning is achieved by classifying model-based denoised samples and samples to which random Gaussian noise is added to a specific sampling step. With the addition of explici… ▽ More In this paper, we propose the Adversarial Denoising Diffusion Model (ADDM). The ADDM is based on the Denoising Diffusion Probabilistic Model (DDPM) but complementarily trained by adversarial learning. The proposed adversarial learning is achieved by classifying model-based denoised samples and samples to which random Gaussian noise is added to a specific sampling step. With the addition of explicit adversarial learning on data samples, ADDM can learn the semantic characteristics of the data more robustly during training, which achieves a similar data sampling performance with much fewer sampling steps than DDPM. We apply ADDM to anomaly detection in unsupervised MRI images. Experimental results show that the proposed ADDM outperformed existing generative model-based unsupervised anomaly detection methods. In particular, compared to other DDPM-based anomaly detection methods, the proposed ADDM shows better performance with the same number of sampling steps and similar performance with 50% fewer sampling steps. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Accepted for the poster session of DGM4H worshop on NeuralPS 2023

arXiv:2311.08329 [pdf, other]

KTRL+F: Knowledge-Augmented In-Document Search

Authors: Hanseok Oh, Haebin Shin, Miyoung Ko, Hyunji Lee, Minjoon Seo

Abstract: We introduce a new problem KTRL+F, a knowledge-augmented in-document search task that necessitates real-time identification of all semantic targets within a document with the awareness of external sources through a single natural query. KTRL+F addresses following unique challenges for in-document search: 1)utilizing knowledge outside the document for extended use of additional information about ta… ▽ More We introduce a new problem KTRL+F, a knowledge-augmented in-document search task that necessitates real-time identification of all semantic targets within a document with the awareness of external sources through a single natural query. KTRL+F addresses following unique challenges for in-document search: 1)utilizing knowledge outside the document for extended use of additional information about targets, and 2) balancing between real-time applicability with the performance. We analyze various baselines in KTRL+F and find limitations of existing models, such as hallucinations, high latency, or difficulties in leveraging external knowledge. Therefore, we propose a Knowledge-Augmented Phrase Retrieval model that shows a promising balance between speed and performance by simply augmenting external knowledge in phrase embedding. We also conduct a user study to verify whether solving KTRL+F can enhance search experience for users. It demonstrates that even with our simple model, users can reduce the time for searching with less queries and reduced extra visits to other sources for collecting evidence. We encourage the research community to work on KTRL+F to enhance more efficient in-document information access. △ Less

Submitted 18 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.18586 [pdf, other]

Optimal Transport for Kernel Gaussian Mixture Models

Authors: Jung Hun Oh, Rena Elkin, Anish Kumar Simhal, Jiening Zhu, Joseph O Deasy, Allen Tannenbaum

Abstract: The Wasserstein distance from optimal mass transport (OMT) is a powerful mathematical tool with numerous applications that provides a natural measure of the distance between two probability distributions. Several methods to incorporate OMT into widely used probabilistic models, such as Gaussian or Gaussian mixture, have been developed to enhance the capability of modeling complex multimodal densit… ▽ More The Wasserstein distance from optimal mass transport (OMT) is a powerful mathematical tool with numerous applications that provides a natural measure of the distance between two probability distributions. Several methods to incorporate OMT into widely used probabilistic models, such as Gaussian or Gaussian mixture, have been developed to enhance the capability of modeling complex multimodal densities of real datasets. However, very few studies have explored the OMT problems in a reproducing kernel Hilbert space (RKHS), wherein the kernel trick is utilized to avoid the need to explicitly map input data into a high-dimensional feature space. In the current study, we propose a Wasserstein-type metric to compute the distance between two Gaussian mixtures in a RKHS via the kernel trick, i.e., kernel Gaussian mixture models. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: 17 pages, 5 figures, 2 tables

arXiv:2310.10493 [pdf, other]

Evaluation and improvement of Segment Anything Model for interactive histopathology image segmentation

Authors: SeungKyu Kim, Hyun-Jic Oh, Seonghui Min, Won-Ki Jeong

Abstract: With the emergence of the Segment Anything Model (SAM) as a foundational model for image segmentation, its application has been extensively studied across various domains, including the medical field. However, its potential in the context of histopathology data, specifically in region segmentation, has received relatively limited attention. In this paper, we evaluate SAM's performance in zero-shot… ▽ More With the emergence of the Segment Anything Model (SAM) as a foundational model for image segmentation, its application has been extensively studied across various domains, including the medical field. However, its potential in the context of histopathology data, specifically in region segmentation, has received relatively limited attention. In this paper, we evaluate SAM's performance in zero-shot and fine-tuned scenarios on histopathology data, with a focus on interactive segmentation. Additionally, we compare SAM with other state-of-the-art interactive models to assess its practical potential and evaluate its generalization capability with domain adaptability. In the experimental results, SAM exhibits a weakness in segmentation performance compared to other models while demonstrating relative strengths in terms of inference time and generalization capability. To improve SAM's limited local refinement ability and to enhance prompt stability while preserving its core strengths, we propose a modification of SAM's decoder. The experimental results suggest that the proposed modification is effective to make SAM useful for interactive histology image segmentation. The code is available at \url{https://github.com/hvcl/SAM_Interactive_Histopathology} △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: MICCAI 2023 workshop accepted (1st International Workshop on Foundation Models for General Medical AI - MedAGI)

arXiv:2309.02745 [pdf, other]

Learning Vehicle Dynamics from Cropped Image Patches for Robot Navigation in Unpaved Outdoor Terrains

Authors: Jeong Hyun Lee, Jinhyeok Choi, Simo Ryu, Hyunsik Oh, Suyoung Choi, Jemin Hwangbo

Abstract: In the realm of autonomous mobile robots, safe navigation through unpaved outdoor environments remains a challenging task. Due to the high-dimensional nature of sensor data, extracting relevant information becomes a complex problem, which hinders adequate perception and path planning. Previous works have shown promising performances in extracting global features from full-sized images. However, th… ▽ More In the realm of autonomous mobile robots, safe navigation through unpaved outdoor environments remains a challenging task. Due to the high-dimensional nature of sensor data, extracting relevant information becomes a complex problem, which hinders adequate perception and path planning. Previous works have shown promising performances in extracting global features from full-sized images. However, they often face challenges in capturing essential local information. In this paper, we propose Crop-LSTM, which iteratively takes cropped image patches around the current robot's position and predicts the future position, orientation, and bumpiness. Our method performs local feature extraction by paying attention to corresponding image patches along the predicted robot trajectory in the 2D image plane. This enables more accurate predictions of the robot's future trajectory. With our wheeled mobile robot platform Raicart, we demonstrated the effectiveness of Crop-LSTM for point-goal navigation in an unpaved outdoor environment. Our method enabled safe and robust navigation using RGBD images in challenging unpaved outdoor terrains. The summary video is available at https://youtu.be/iIGNZ8ignk0. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 8 pages, 10 figures

arXiv:2308.12517 [pdf, other]

Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion

Authors: Yunho Kim, Hyunsik Oh, Jeonghyun Lee, Jinhyeok Choi, Gwanghyeon Ji, Moonkyu Jung, Donghoon Youm, Jemin Hwangbo

Abstract: Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of… ▽ More Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of designing numerous reward terms and determining suitable reward coefficients. In this work, we propose a novel reinforcement learning framework for training neural network controllers for complex robotic systems consisting of both rewards and constraints. To let the engineers appropriately reflect their intent to constraints and handle them with minimal computation overhead, two constraint types and an efficient policy optimization algorithm are suggested. The learning framework is applied to train locomotion controllers for several legged robots with different morphology and physical attributes to traverse challenging terrains. Extensive simulation and real-world experiments demonstrate that performant controllers can be trained with significantly less reward engineering, by tuning only a single reward coefficient. Furthermore, a more straightforward and intuitive engineering process can be utilized, thanks to the interpretability and generalizability of constraints. The summary video is available at https://youtu.be/KAlm3yskhvM. △ Less

Submitted 6 May, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: Accepted to IEEE Transactions on Robotics (T-RO) 2024

arXiv:2308.09177 [pdf, other]

doi 10.1145/3577190.3614174

Recognizing Intent in Collaborative Manipulation

Authors: Zhanibek Rysbek, Ki Hwan Oh, Milos Zefran

Abstract: Collaborative manipulation is inherently multimodal, with haptic communication playing a central role. When performed by humans, it involves back-and-forth force exchanges between the participants through which they resolve possible conflicts and determine their roles. Much of the existing work on collaborative human-robot manipulation assumes that the robot follows the human. But for a robot to m… ▽ More Collaborative manipulation is inherently multimodal, with haptic communication playing a central role. When performed by humans, it involves back-and-forth force exchanges between the participants through which they resolve possible conflicts and determine their roles. Much of the existing work on collaborative human-robot manipulation assumes that the robot follows the human. But for a robot to match the performance of a human partner it needs to be able to take initiative and lead when appropriate. To achieve such human-like performance, the robot needs to have the ability to (1) determine the intent of the human, (2) clearly express its own intent, and (3) choose its actions so that the dyad reaches consensus. This work proposes a framework for recognizing human intent in collaborative manipulation tasks using force exchanges. Grounded in a dataset collected during a human study, we introduce a set of features that can be computed from the measured signals and report the results of a classifier trained on our collected human-human interaction data. Two metrics are used to evaluate the intent recognizer: overall accuracy and the ability to correctly identify transitions. The proposed recognizer shows robustness against the variations in the partner's actions and the confounding effects due to the variability in grasp forces and dynamic effects of walking. The results demonstrate that the proposed recognizer is well-suited for implementation in a physical interaction control scheme. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.05992 [pdf, other]

Reachable Set-based Path Planning for Automated Vertical Parking System

Authors: In Hyuk Oh, Ju Won Seo, Jin Sung Kim, Chung Choo Chung

Abstract: This paper proposes a local path planning method with a reachable set for Automated vertical Parking Systems (APS). First, given a parking lot layout with a goal position, we define an intermediate pose for the APS to accomplish reverse parking with a single maneuver, i.e., without changing the gear shift. Then, we introduce a reachable set which is a set of points consisting of the grid points of… ▽ More This paper proposes a local path planning method with a reachable set for Automated vertical Parking Systems (APS). First, given a parking lot layout with a goal position, we define an intermediate pose for the APS to accomplish reverse parking with a single maneuver, i.e., without changing the gear shift. Then, we introduce a reachable set which is a set of points consisting of the grid points of all possible intermediate poses. Once the APS approaches the goal position, it must select an intermediate pose in the reachable set. A minimization problem was formulated and solved to choose the intermediate pose. We performed various scenarios with different parking lot conditions. We used the Hybrid-A* algorithm for the global path planning to move the vehicle from the starting pose to the intermediate pose and utilized clothoid-based local path planning to move from the intermediate pose to the goal pose. Additionally, we designed a controller to follow the generated path and validated its tracking performance. It was confirmed that the tracking error in the mean root square for the lateral position was bounded within 0.06m and for orientation within 0.01rad. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 8 pages, 10 figures, conference. This is the Accepted Manuscript version of an article accepted for publication in [IEEE International Conference on Intelligent Transportation Systems ITSC 2023]. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. No information about DOI has been posted yet

arXiv:2307.16549 [pdf, other]

DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training

Authors: Hyung-Seok Oh, Sang-Hoon Lee, Seong-Whan Lee

Abstract: Expressive text-to-speech systems have undergone significant advancements owing to prosody modeling, but conventional methods can still be improved. Traditional approaches have relied on the autoregressive method to predict the quantized prosody vector; however, it suffers from the issues of long-term dependency and slow inference. This study proposes a novel approach called DiffProsody in which e… ▽ More Expressive text-to-speech systems have undergone significant advancements owing to prosody modeling, but conventional methods can still be improved. Traditional approaches have relied on the autoregressive method to predict the quantized prosody vector; however, it suffers from the issues of long-term dependency and slow inference. This study proposes a novel approach called DiffProsody in which expressive speech is synthesized using a diffusion-based latent prosody generator and prosody conditional adversarial training. Our findings confirm the effectiveness of our prosody generator in generating a prosody vector. Furthermore, our prosody conditional discriminator significantly improves the quality of the generated speech by accurately emulating prosody. We use denoising diffusion generative adversarial networks to improve the prosody generation speed. Consequently, DiffProsody is capable of generating prosody 16 times faster than the conventional diffusion model. The superior performance of our proposed method has been demonstrated via experiments. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 10 pages, 8 figures, 5 tables, under review

arXiv:2307.16171 [pdf, other]

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer

Authors: Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh, Seong-Whan Lee

Abstract: Despite rapid progress in the voice style transfer (VST) field, recent zero-shot VST systems still lack the ability to transfer the voice style of a novel speaker. In this paper, we present HierVST, a hierarchical adaptive end-to-end zero-shot VST model. Without any text transcripts, we only use the speech dataset to train the model by utilizing hierarchical variational inference and self-supervis… ▽ More Despite rapid progress in the voice style transfer (VST) field, recent zero-shot VST systems still lack the ability to transfer the voice style of a novel speaker. In this paper, we present HierVST, a hierarchical adaptive end-to-end zero-shot VST model. Without any text transcripts, we only use the speech dataset to train the model by utilizing hierarchical variational inference and self-supervised representation. In addition, we adopt a hierarchical adaptive generator that generates the pitch representation and waveform audio sequentially. Moreover, we utilize unconditional generation to improve the speaker-relative acoustic capacity in the acoustic representation. With a hierarchical adaptive structure, the model can adapt to a novel voice style and convert speech progressively. The experimental results demonstrate that our method outperforms other VST models in zero-shot VST scenarios. Audio samples are available at \url{https://hiervst.github.io/}. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: INTERSPEECH 2023 (Oral)

arXiv:2307.02682 [pdf, other]

Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment

Authors: Yongrae Jo, Seongyun Lee, Aiden SJ Lee, Hyunji Lee, Hanseok Oh, Minjoon Seo

Abstract: Dense video captioning, a task of localizing meaningful moments and generating relevant captions for videos, often requires a large, expensive corpus of annotated video segments paired with text. In an effort to minimize the annotation cost, we propose ZeroTA, a novel method for dense video captioning in a zero-shot manner. Our method does not require any videos or annotations for training; instea… ▽ More Dense video captioning, a task of localizing meaningful moments and generating relevant captions for videos, often requires a large, expensive corpus of annotated video segments paired with text. In an effort to minimize the annotation cost, we propose ZeroTA, a novel method for dense video captioning in a zero-shot manner. Our method does not require any videos or annotations for training; instead, it localizes and describes events within each input video at test time by optimizing solely on the input. This is accomplished by introducing a soft moment mask that represents a temporal segment in the video and jointly optimizing it with the prefix parameters of a language model. This joint optimization aligns a frozen language generation model (i.e., GPT-2) with a frozen vision-language contrastive model (i.e., CLIP) by maximizing the matching score between the generated text and a moment within the video. We also introduce a pairwise temporal IoU loss to let a set of soft moment masks capture multiple distinct events within the video. Our method effectively discovers diverse significant events within the video, with the resulting captions appropriately describing these events. The empirical results demonstrate that ZeroTA surpasses zero-shot baselines and even outperforms the state-of-the-art few-shot method on the widely-used benchmark ActivityNet Captions. Moreover, our method shows greater robustness compared to supervised methods when evaluated in out-of-domain scenarios. This research provides insight into the potential of aligning widely-used models, such as language generation models and vision-language models, to unlock a new capability: understanding temporal aspects of videos. △ Less

Submitted 11 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.14136 [pdf, other]

Scribble-supervised Cell Segmentation Using Multiscale Contrastive Regularization

Authors: Hyun-Jic Oh, Kanggeun Lee, Won-Ki Jeong

Abstract: Current state-of-the-art supervised deep learning-based segmentation approaches have demonstrated superior performance in medical image segmentation tasks. However, such supervised approaches require fully annotated pixel-level ground-truth labels, which are labor-intensive and time-consuming to acquire. Recently, Scribble2Label (S2L) demonstrated that using only a handful of scribbles with self-s… ▽ More Current state-of-the-art supervised deep learning-based segmentation approaches have demonstrated superior performance in medical image segmentation tasks. However, such supervised approaches require fully annotated pixel-level ground-truth labels, which are labor-intensive and time-consuming to acquire. Recently, Scribble2Label (S2L) demonstrated that using only a handful of scribbles with self-supervised learning can generate accurate segmentation results without full annotation. However, owing to the relatively small size of scribbles, the model is prone to overfit and the results may be biased to the selection of scribbles. In this work, we address this issue by employing a novel multiscale contrastive regularization term for S2L. The main idea is to extract features from intermediate layers of the neural network for contrastive loss so that structures at various scales can be effectively separated. To verify the efficacy of our method, we conducted ablation studies on well-known datasets, such as Data Science Bowl 2018 and MoNuSeg. The results show that the proposed multiscale contrastive loss is effective in improving the performance of S2L, which is comparable to that of the supervised learning segmentation method. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: ISBI 2022 accepted

arXiv:2306.14132 [pdf, other]

DiffMix: Diffusion Model-based Data Synthesis for Nuclei Segmentation and Classification in Imbalanced Pathology Image Datasets

Authors: Hyun-Jic Oh, Won-Ki Jeong

Abstract: Nuclei segmentation and classification is a significant process in pathology image analysis. Deep learning-based approaches have greatly contributed to the higher accuracy of this task. However, those approaches suffer from the imbalanced nuclei data composition, which shows lower classification performance on the rare nuclei class. In this paper, we propose a realistic data synthesis method using… ▽ More Nuclei segmentation and classification is a significant process in pathology image analysis. Deep learning-based approaches have greatly contributed to the higher accuracy of this task. However, those approaches suffer from the imbalanced nuclei data composition, which shows lower classification performance on the rare nuclei class. In this paper, we propose a realistic data synthesis method using a diffusion model. We generate two types of virtual patches to enlarge the training data distribution, which is for balancing the nuclei class variance and for enlarging the chance to look at various nuclei. After that, we use a semantic-label-conditioned diffusion model to generate realistic and high-quality image samples. We demonstrate the efficacy of our method by experiment results on two imbalanced nuclei datasets, improving the state-of-the-art networks. The experimental results suggest that the proposed method improves the classification performance of the rare type nuclei classification, while showing superior segmentation and classification performance in imbalanced pathology nuclei datasets. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: MICCAI 2023 accepted

arXiv:2304.12288 [pdf, other]

Robots Taking Initiative in Collaborative Object Manipulation: Lessons from Physical Human-Human Interaction

Authors: Zhanibek Rysbek, Ki Hwan Oh, Afagh Mehri Shervedani, Timotej Klemencic, Milos Zefran, Barbara Di Eugenio

Abstract: Physical Human-Human Interaction (pHHI) involves the use of multiple sensory modalities. Studies of communication through spoken utterances and gestures are well established, but communication through force signals is not well understood. In this paper, we focus on investigating the mechanisms employed by humans during the negotiation through force signals, and how the robot can communicate task g… ▽ More Physical Human-Human Interaction (pHHI) involves the use of multiple sensory modalities. Studies of communication through spoken utterances and gestures are well established, but communication through force signals is not well understood. In this paper, we focus on investigating the mechanisms employed by humans during the negotiation through force signals, and how the robot can communicate task goals, comprehend human intent, and take the lead as needed. To achieve this, we formulate a task that requires active force communication and propose a taxonomy that extends existing literature. Also, we conducted a study to observe how humans behave during collaborative manipulation tasks. An important contribution of this work is the novel features based on force-kinematic signals that demonstrate predictive power to recognize symbolic human intent. Further, we show the feasibility of developing a real-time intent classifier based on the novel features and speculate the role it plays in high-level robot controllers for physical Human-Robot Interaction (pHRI). This work provides important steps to achieve more human-like fluid interaction in physical co-manipulation tasks that are applicable and not limited to humanoid, assistive robots, and human-in-the-loop automation. △ Less

Submitted 29 July, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

arXiv:2303.12375 [pdf, other]

Disturbance Injection under Partial Automation: Robust Imitation Learning for Long-horizon Tasks

Authors: Hirotaka Tahara, Hikaru Sasaki, Hanbit Oh, Edgar Anarossi, Takamitsu Matsubara

Abstract: Partial Automation (PA) with intelligent support systems has been introduced in industrial machinery and advanced automobiles to reduce the burden of long hours of human operation. Under PA, operators perform manual operations (providing actions) and operations that switch to automatic/manual mode (mode-switching). Since PA reduces the total duration of manual operation, these two action and mode-… ▽ More Partial Automation (PA) with intelligent support systems has been introduced in industrial machinery and advanced automobiles to reduce the burden of long hours of human operation. Under PA, operators perform manual operations (providing actions) and operations that switch to automatic/manual mode (mode-switching). Since PA reduces the total duration of manual operation, these two action and mode-switching operations can be replicated by imitation learning with high sample efficiency. To this end, this paper proposes Disturbance Injection under Partial Automation (DIPA) as a novel imitation learning framework. In DIPA, mode and actions (in the manual mode) are assumed to be observables in each state and are used to learn both action and mode-switching policies. The above learning is robustified by injecting disturbances into the operator's actions to optimize the disturbance's level for minimizing the covariate shift under PA. We experimentally validated the effectiveness of our method for long-horizon tasks in two simulations and a real robot environment and confirmed that our method outperformed the previous methods and reduced the demonstration burden. △ Less

Submitted 22 March, 2023; originally announced March 2023.

Comments: 8 pages, Accepted by Robotics and Automation Letters (RA-L) 2023

arXiv:2302.03175 [pdf, other]

Genetic Programming Based Symbolic Regression for Analytical Solutions to Differential Equations

Authors: Hongsup Oh, Roman Amici, Geoffrey Bomarito, Shandian Zhe, Robert Kirby, Jacob Hochhalter

Abstract: In this paper, we present a machine learning method for the discovery of analytic solutions to differential equations. The method utilizes an inherently interpretable algorithm, genetic programming based symbolic regression. Unlike conventional accuracy measures in machine learning we demonstrate the ability to recover true analytic solutions, as opposed to a numerical approximation. The method is… ▽ More In this paper, we present a machine learning method for the discovery of analytic solutions to differential equations. The method utilizes an inherently interpretable algorithm, genetic programming based symbolic regression. Unlike conventional accuracy measures in machine learning we demonstrate the ability to recover true analytic solutions, as opposed to a numerical approximation. The method is verified by assessing its ability to recover known analytic solutions for two separate differential equations. The developed method is compared to a conventional, purely data-driven genetic programming based symbolic regression algorithm. The reliability of successful evolution of the true solution, or an algebraic equivalent, is demonstrated. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: 14 pages, 9 figures

arXiv:2211.03393 [pdf, other]

Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies for Robot Manipulation

Authors: Hanbit Oh, Hikaru Sasaki, Brendan Michael, Takamitsu Matsubara

Abstract: Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demo… ▽ More Humans demonstrate a variety of interesting behavioral characteristics when performing tasks, such as selecting between seemingly equivalent optimal actions, performing recovery actions when deviating from the optimal trajectory, or moderating actions in response to sensed risks. However, imitation learning, which attempts to teach robots to perform these same tasks from observations of human demonstrations, often fails to capture such behavior. Specifically, commonly used learning algorithms embody inherent contradictions between the learning assumptions (e.g., single optimal action) and actual human behavior (e.g., multiple optimal actions), thereby limiting robot generalizability, applicability, and demonstration feasibility. To address this, this paper proposes designing imitation learning algorithms with a focus on utilizing human behavioral characteristics, thereby embodying principles for capturing and exploiting actual demonstrator behavioral characteristics. This paper presents the first imitation learning framework, Bayesian Disturbance Injection (BDI), that typifies human behavioral characteristics by incorporating model flexibility, robustification, and risk sensitivity. Bayesian inference is used to learn flexible non-parametric multi-action policies, while simultaneously robustifying policies by injecting risk-sensitive disturbances to induce human recovery action and ensuring demonstration feasibility. Our method is evaluated through risk-sensitive simulations and real-robot experiments (e.g., table-sweep task, shaft-reach task and shaft-insertion task) using the UR5e 6-DOF robotic arm, to demonstrate the improved characterisation of behavior. Results show significant improvement in task performance, through improved flexibility, robustness as well as demonstration feasibility. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: 69 pages, 9 figures, accepted by Elsevier Neural Networks - Journal

arXiv:2210.02068 [pdf, other]

Nonparametric Decoding for Generative Retrieval

Authors: Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vlad Karpukhin, Yi Lu, Minjoon Seo

Abstract: The generative retrieval model depends solely on the information encoded in its model parameters without external memory, its information capacity is limited and fixed. To overcome the limitation, we propose Nonparametric Decoding (Np Decoding) which can be applied to existing generative retrieval models. Np Decoding uses nonparametric contextualized vocab embeddings (external memory) rather than… ▽ More The generative retrieval model depends solely on the information encoded in its model parameters without external memory, its information capacity is limited and fixed. To overcome the limitation, we propose Nonparametric Decoding (Np Decoding) which can be applied to existing generative retrieval models. Np Decoding uses nonparametric contextualized vocab embeddings (external memory) rather than vanilla vocab embeddings as decoder vocab embeddings. By leveraging the contextualized vocab embeddings, the generative retrieval model is able to utilize both the parametric and nonparametric space. Evaluation over 9 datasets (8 single-hop and 1 multi-hop) in the document retrieval task shows that applying Np Decoding to generative retrieval models significantly improves the performance. We also show that Np Decoding is data- and parameter-efficient, and shows high performance in the zero-shot setting. △ Less

Submitted 28 May, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: published at Findings of ACL 2023

arXiv:2208.07422 [pdf, other]

Deep Unsupervised Domain Adaptation: A Review of Recent Advances and Perspectives

Authors: Xiaofeng Liu, Chaehwa Yoo, Fangxu Xing, Hyejin Oh, Georges El Fakhri, Je-Won Kang, Jonghye Woo

Abstract: Deep learning has become the method of choice to tackle real-world problems in different domains, partly because of its ability to learn from data and achieve impressive performance on a wide range of applications. However, its success usually relies on two assumptions: (i) vast troves of labeled datasets are required for accurate model fitting, and (ii) training and testing data are independent a… ▽ More Deep learning has become the method of choice to tackle real-world problems in different domains, partly because of its ability to learn from data and achieve impressive performance on a wide range of applications. However, its success usually relies on two assumptions: (i) vast troves of labeled datasets are required for accurate model fitting, and (ii) training and testing data are independent and identically distributed. Its performance on unseen target domains, thus, is not guaranteed, especially when encountering out-of-distribution data at the adaptation stage. The performance drop on data in a target domain is a critical problem in deploying deep neural networks that are successfully trained on data in a source domain. Unsupervised domain adaptation (UDA) is proposed to counter this, by leveraging both labeled source domain data and unlabeled target domain data to carry out various tasks in the target domain. UDA has yielded promising results on natural image processing, video analysis, natural language processing, time-series data analysis, medical image analysis, etc. In this review, as a rapidly evolving topic, we provide a systematic comparison of its methods and applications. In addition, the connection of UDA with its closely related tasks, e.g., domain generalization and out-of-distribution detection, has also been discussed. Furthermore, deficiencies in current methods and possible promising directions are highlighted. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: APSIPA Transactions on Signal and Information Processing

arXiv:2208.04832 [pdf, other]

On the Importance of Critical Period in Multi-stage Reinforcement Learning

Authors: Junseok Park, Inwoo Hwang, Min Whoo Lee, Hyunseok Oh, Minsu Lee, Youngki Lee, Byoung-Tak Zhang

Abstract: The initial years of an infant's life are known as the critical period, during which the overall development of learning performance is significantly impacted due to neural plasticity. In recent studies, an AI agent, with a deep neural network mimicking mechanisms of actual neurons, exhibited a learning period similar to human's critical period. Especially during this initial period, the appropria… ▽ More The initial years of an infant's life are known as the critical period, during which the overall development of learning performance is significantly impacted due to neural plasticity. In recent studies, an AI agent, with a deep neural network mimicking mechanisms of actual neurons, exhibited a learning period similar to human's critical period. Especially during this initial period, the appropriate stimuli play a vital role in developing learning ability. However, transforming human cognitive bias into an appropriate shaping reward is quite challenging, and prior works on critical period do not focus on finding the appropriate stimulus. To take a step further, we propose multi-stage reinforcement learning to emphasize finding ``appropriate stimulus" around the critical period. Inspired by humans' early cognitive-developmental stage, we use multi-stage guidance near the critical period, and demonstrate the appropriate shaping reward (stage-2 guidance) in terms of the AI agent's performance, efficiency, and stability. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted by the ICML Complex Feedback in Online Learning Workshop (Open Problems) 2022

arXiv:2205.04195 [pdf, other]

Disturbance-Injected Robust Imitation Learning with Task Achievement

Authors: Hirotaka Tahara, Hikaru Sasaki, Hanbit Oh, Brendan Michael, Takamitsu Matsubara

Abstract: Robust imitation learning using disturbance injections overcomes issues of limited variation in demonstrations. However, these methods assume demonstrations are optimal, and that policy stabilization can be learned via simple augmentations. In real-world scenarios, demonstrations are often of diverse-quality, and disturbance injection instead learns sub-optimal policies that fail to replicate desi… ▽ More Robust imitation learning using disturbance injections overcomes issues of limited variation in demonstrations. However, these methods assume demonstrations are optimal, and that policy stabilization can be learned via simple augmentations. In real-world scenarios, demonstrations are often of diverse-quality, and disturbance injection instead learns sub-optimal policies that fail to replicate desired behavior. To address this issue, this paper proposes a novel imitation learning framework that combines both policy robustification and optimal demonstration learning. Specifically, this combinatorial approach forces policy learning and disturbance injection optimization to focus on mainly learning from high task achievement demonstrations, while utilizing low achievement ones to decrease the number of samples needed. The effectiveness of the proposed method is verified through experiments using an excavation task in both simulations and a real robot, resulting in high-achieving policies that are more stable and robust to diverse-quality demonstrations. In addition, this method utilizes all of the weighted sub-optimal demonstrations without eliminating them, resulting in practical data efficiency benefits. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 7 pages, Accepted by the 2022 International Conference on Robotics and Automation (ICRA 2022)

arXiv:2205.00824 [pdf, other]

doi 10.1016/j.inffus.2022.03.003

Exploration in Deep Reinforcement Learning: A Survey

Authors: Pawel Ladosz, Lilian Weng, Minwoo Kim, Hyondong Oh

Abstract: This paper reviews exploration techniques in deep reinforcement learning. Exploration techniques are of primary importance when solving sparse reward problems. In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly. In such a scenario, it is challenging for reinforcement learning to learn rewards and actions association. Thus mor… ▽ More This paper reviews exploration techniques in deep reinforcement learning. Exploration techniques are of primary importance when solving sparse reward problems. In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly. In such a scenario, it is challenging for reinforcement learning to learn rewards and actions association. Thus more sophisticated exploration methods need to be devised. This review provides a comprehensive overview of existing exploration approaches, which are categorized based on the key contributions as follows reward novel states, reward diverse behaviours, goal-based methods, probabilistic methods, imitation-based methods, safe exploration and random-based methods. Then, the unsolved challenges are discussed to provide valuable future research directions. Finally, the approaches of different categories are compared in terms of complexity, computational effort and overall performance. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.13596 [pdf, other]

Generative Multi-hop Retrieval

Authors: Hyunji Lee, Sohee Yang, Hanseok Oh, Minjoon Seo

Abstract: A common practice for text retrieval is to use an encoder to map the documents and the query to a common vector space and perform a nearest neighbor search (NNS); multi-hop retrieval also often adopts the same paradigm, usually with a modification of iteratively reformulating the query vector so that it can retrieve different documents at each hop. However, such a bi-encoder approach has limitatio… ▽ More A common practice for text retrieval is to use an encoder to map the documents and the query to a common vector space and perform a nearest neighbor search (NNS); multi-hop retrieval also often adopts the same paradigm, usually with a modification of iteratively reformulating the query vector so that it can retrieve different documents at each hop. However, such a bi-encoder approach has limitations in multi-hop settings; (1) the reformulated query gets longer as the number of hops increases, which further tightens the embedding bottleneck of the query vector, and (2) it is prone to error propagation. In this paper, we focus on alleviating these limitations in multi-hop settings by formulating the problem in a fully generative way. We propose an encoder-decoder model that performs multi-hop retrieval by simply generating the entire text sequences of the retrieval targets, which means the query and the documents interact in the language model's parametric space rather than L2 or inner product space as in the bi-encoder approach. Our approach, Generative Multi-hop Retrieval(GMR), consistently achieves comparable or higher performance than bi-encoder models in five datasets while demonstrating superior GPU memory and storage footprint. △ Less

Submitted 16 October, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: published at EMNLP 2022

arXiv:2203.11593 [pdf, other]

Unified Negative Pair Generation toward Well-discriminative Feature Space for Face Recognition

Authors: Junuk Jung, Seonhoon Lee, Heung-Seon Oh, Yongjun Park, Joochan Park, Sungbin Son

Abstract: The goal of face recognition (FR) can be viewed as a pair similarity optimization problem, maximizing a similarity set $\mathcal{S}^p$ over positive pairs, while minimizing similarity set $\mathcal{S}^n$ over negative pairs. Ideally, it is expected that FR models form a well-discriminative feature space (WDFS) that satisfies $\inf{\mathcal{S}^p} > \sup{\mathcal{S}^n}$. With regard to WDFS, the exi… ▽ More The goal of face recognition (FR) can be viewed as a pair similarity optimization problem, maximizing a similarity set $\mathcal{S}^p$ over positive pairs, while minimizing similarity set $\mathcal{S}^n$ over negative pairs. Ideally, it is expected that FR models form a well-discriminative feature space (WDFS) that satisfies $\inf{\mathcal{S}^p} > \sup{\mathcal{S}^n}$. With regard to WDFS, the existing deep feature learning paradigms (i.e., metric and classification losses) can be expressed as a unified perspective on different pair generation (PG) strategies. Unfortunately, in the metric loss (ML), it is infeasible to generate negative pairs taking all classes into account in each iteration because of the limited mini-batch size. In contrast, in classification loss (CL), it is difficult to generate extremely hard negative pairs owing to the convergence of the class weight vectors to their center. This leads to a mismatch between the two similarity distributions of the sampled pairs and all negative pairs. Thus, this paper proposes a unified negative pair generation (UNPG) by combining two PG strategies (i.e., MLPG and CLPG) from a unified perspective to alleviate the mismatch. UNPG introduces useful information about negative pairs using MLPG to overcome the CLPG deficiency. Moreover, it includes filtering the similarities of noisy negative pairs to guarantee reliable convergence and improved performance. Exhaustive experiments show the superiority of UNPG by achieving state-of-the-art performance across recent loss functions on public benchmark datasets. Our code and pretrained models are publicly available. △ Less

Submitted 18 April, 2024; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 9 pages, 6 figures, Published at BMVC22

arXiv:2202.10655 [pdf, other]

Shape-Haptics: Planar & Passive Force Feedback Mechanisms for Physical Interfaces

Authors: Clement Zheng, Zhen Zhou Yong, Hongnan Lin, HyunJoo Oh, Ching Chiuan Yen

Abstract: We present Shape-Haptics, an approach for designers to rapidly design and fabricate passive force feedback mechanisms for physical interfaces. Such mechanisms are used in everyday interfaces and tools, and they are challenging to design. Shape-Haptics abstracts and broadens the haptic expression of this class of force feedback systems through 2D laser cut configurations that are simple to fabricat… ▽ More We present Shape-Haptics, an approach for designers to rapidly design and fabricate passive force feedback mechanisms for physical interfaces. Such mechanisms are used in everyday interfaces and tools, and they are challenging to design. Shape-Haptics abstracts and broadens the haptic expression of this class of force feedback systems through 2D laser cut configurations that are simple to fabricate. They leverage the properties of polyoxymethylene plastic and comprise a compliant spring structure that engages with a sliding profile during tangible interaction. By shaping the sliding profile, designers can easily customize the haptic force feedback delivered by the mechanism. We provide a computational design sandbox to facilitate designers to explore and fabricate Shape-Haptics mechanisms. We also propose a series of applications that demonstrate the utility of Shape-Haptics in creating and customizing haptics for different physical interfaces. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: To appear in the conference proceedings at ACM CHI 2022

arXiv:2201.04990 [pdf, other]

doi 10.1145/3462244.3479932

Toddler-Guidance Learning: Impacts of Critical Period on Multimodal AI Agents

Authors: Junseok Park, Kwanyoung Park, Hyunseok Oh, Ganghun Lee, Minsu Lee, Youngki Lee, Byoung-Tak Zhang

Abstract: Critical periods are phases during which a toddler's brain develops in spurts. To promote children's cognitive development, proper guidance is critical in this stage. However, it is not clear whether such a critical period also exists for the training of AI agents. Similar to human toddlers, well-timed guidance and multimodal interactions might significantly enhance the training efficiency of AI a… ▽ More Critical periods are phases during which a toddler's brain develops in spurts. To promote children's cognitive development, proper guidance is critical in this stage. However, it is not clear whether such a critical period also exists for the training of AI agents. Similar to human toddlers, well-timed guidance and multimodal interactions might significantly enhance the training efficiency of AI agents as well. To validate this hypothesis, we adapt this notion of critical periods to learning in AI agents and investigate the critical period in the virtual environment for AI agents. We formalize the critical period and Toddler-guidance learning in the reinforcement learning (RL) framework. Then, we built up a toddler-like environment with VECA toolkit to mimic human toddlers' learning characteristics. We study three discrete levels of mutual interaction: weak-mentor guidance (sparse reward), moderate mentor guidance (helper-reward), and mentor demonstration (behavioral cloning). We also introduce the EAVE dataset consisting of 30,000 real-world images to fully reflect the toddler's viewpoint. We evaluate the impact of critical periods on AI agents from two perspectives: how and when they are guided best in both uni- and multimodal learning. Our experimental results show that both uni- and multimodal agents with moderate mentor guidance and critical period on 1 million and 2 million training steps show a noticeable improvement. We validate these results with transfer learning on the EAVE dataset and find the performance advancement on the same critical period and the guidance. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: ICMI2021 Oral Presentation, 9 pages, 9 figures

ACM Class: I.2.0; I.6.5

arXiv:2112.12353 [pdf]

LAME: Layout Aware Metadata Extraction Approach for Research Articles

Authors: Jongyun Choi, Hyesoo Kong, Hwamook Yoon, Heung-Seon Oh, Yuchul Jung

Abstract: The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata… ▽ More The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of an automatic layout analysis, construction of a large meta-data training set, and construction of Layout-MetaBERT). We designed an automatic layout analysis using PDFMiner. Based on the layout analysis, a large volume of metadata-separated training data, including the title, abstract, author name, author affiliated organization, and keywords, were automatically extracted. Moreover, we constructed Layout-MetaBERT to extract the metadata from academic journals with varying layout formats. The experimental results with Layout-MetaBERT exhibited robust performance (Macro-F1, 93.27%) in metadata extraction for unseen journals with different layout formats. △ Less

Submitted 22 December, 2021; originally announced December 2021.

ACM Class: I.2.7

arXiv:2111.11104 [pdf, other]

doi 10.1609/aaai.v37i11.26520

Hierarchical Text Classification As Sub-Hierarchy Sequence Generation

Authors: SangHun Im, Gibaeg Kim, Heung-Seon Oh, Seongung Jo, Donghwan Kim

Abstract: Hierarchical text classification (HTC) is essential for various real applications. However, HTC models are challenging to develop because they often require processing a large volume of documents and labels with hierarchical taxonomy. Recent HTC models based on deep learning have attempted to incorporate hierarchy information into a model structure. Consequently, these models are challenging to im… ▽ More Hierarchical text classification (HTC) is essential for various real applications. However, HTC models are challenging to develop because they often require processing a large volume of documents and labels with hierarchical taxonomy. Recent HTC models based on deep learning have attempted to incorporate hierarchy information into a model structure. Consequently, these models are challenging to implement when the model parameters increase for a large-scale hierarchy because the model structure depends on the hierarchy size. To solve this problem, we formulate HTC as a sub-hierarchy sequence generation to incorporate hierarchy information into a target label sequence instead of the model structure. Subsequently, we propose the Hierarchy DECoder (HiDEC), which decodes a text sequence into a sub-hierarchy sequence using recursive hierarchy decoding, classifying all parents at the same level into children at once. In addition, HiDEC is trained to use hierarchical path information from a root to each leaf in a sub-hierarchy composed of the labels of a target document via an attention mechanism and hierarchy-aware masking. HiDEC achieved state-of-the-art performance with significantly fewer model parameters than existing models on benchmark datasets, such as RCV1-v2, NYT, and EURLEX57K. △ Less

Submitted 6 November, 2023; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: 9 pages, 5 figures, Published at AAAI23

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 12933-12941 (2023)

arXiv:2111.10403 [pdf, other]

Towards Integrative Multi-Modal Personal Health Navigation Systems: Framework and Application

Authors: Nitish Nag, Hyungik Oh, Mengfan Tang, Mingshu Shi, Ramesh Jain

Abstract: It is well understood that an individual's health trajectory is influenced by choices made in each moment, such as from lifestyle or medical decisions. With the advent of modern sensing technologies, individuals have more data and information about themselves than any other time in history. How can we use this data to make the best decisions to keep the health state optimal? We propose a generaliz… ▽ More It is well understood that an individual's health trajectory is influenced by choices made in each moment, such as from lifestyle or medical decisions. With the advent of modern sensing technologies, individuals have more data and information about themselves than any other time in history. How can we use this data to make the best decisions to keep the health state optimal? We propose a generalized Personal Health Navigation (PHN) framework. PHN takes individuals towards their personal health goals through a system which perpetually digests data streams, estimates current health status, computes the best route through intermediate states utilizing personal models, and guides the best inputs that carry a user towards their goal. In addition to describing the general framework, we test the PHN system in two experiments within the field of cardiology. First, we prospectively test a knowledge-infused cardiovascular PHN system with a pilot clinical trial of 41 users. Second, we build a data-driven personalized model on cardiovascular exercise response variability on a smartwatch data-set of 33,269 real-world users. We conclude with critical challenges in health computing for PHN systems that require deep future investigation. △ Less

Submitted 18 May, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

arXiv:2111.04589 [pdf, other]

An Improved Local Search Algorithm for k-Median

Authors: Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, David Saulpic

Abstract: We present a new local-search algorithm for the $k$-median clustering problem. We show that local optima for this algorithm give a $(2.836+ε)$-approximation; our result improves upon the $(3+ε)$-approximate local-search algorithm of Arya et al. [STOC 01]. Moreover, a computer-aided analysis of a natural extension suggests that this approach may lead to an improvement over the best-known approximat… ▽ More We present a new local-search algorithm for the $k$-median clustering problem. We show that local optima for this algorithm give a $(2.836+ε)$-approximation; our result improves upon the $(3+ε)$-approximate local-search algorithm of Arya et al. [STOC 01]. Moreover, a computer-aided analysis of a natural extension suggests that this approach may lead to an improvement over the best-known approximation guarantee for the problem. The new ingredient in our algorithm is the use of a potential function based on both the closest and second-closest facilities to each client. Specifically, the potential is the sum over all clients, of the distance of the client to its closest facility, plus (a small constant times) the truncated distance to its second-closest facility. We move from one solution to another only if the latter can be obtained by swapping a constant number of facilities, and has a smaller potential than the former. This refined potential allows us to avoid the bad local optima given by Arya et al. for the local-search algorithm based only on the cost of the solution. △ Less

Submitted 8 November, 2021; originally announced November 2021.

Comments: To appear at SODA 22

ACM Class: F.2.2

arXiv:2111.01717 [pdf, other]

MixFace: Improving Face Verification Focusing on Fine-grained Conditions

Authors: Junuk Jung, Sungbin Son, Joochan Park, Yongjun Park, Seonhoon Lee, Heung-Seon Oh

Abstract: The performance of face recognition has become saturated for public benchmark datasets such as LFW, CFP-FP, and AgeDB, owing to the rapid advances in CNNs. However, the effects of faces with various fine-grained conditions on FR models have not been investigated because of the absence of such datasets. This paper analyzes their effects in terms of different conditions and loss functions using K-FA… ▽ More The performance of face recognition has become saturated for public benchmark datasets such as LFW, CFP-FP, and AgeDB, owing to the rapid advances in CNNs. However, the effects of faces with various fine-grained conditions on FR models have not been investigated because of the absence of such datasets. This paper analyzes their effects in terms of different conditions and loss functions using K-FACE, a recently introduced FR dataset with fine-grained conditions. We propose a novel loss function, MixFace, that combines classification and metric losses. The superiority of MixFace in terms of effectiveness and robustness is demonstrated experimentally on various benchmark datasets. △ Less

Submitted 19 June, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: 9 pages, 6 figures

Showing 1–50 of 88 results for author: Oh, H