subscribe to arXiv mailings

MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

Authors: Thao Minh Nguyen Phan, Cong-Tinh Dao, Chenwei Wu, Jian-Zhe Wang, Shun Liu, Jun-En Ding, David Restrepo, Feng Liu, Fang-Ming Hung, Wen-Chih Peng

Abstract: Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they eithe… ▽ More Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2406.00307 [pdf, other]

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

Authors: Khoa Vo, Thinh Phan, Kashu Yamazaki, Minh Tran, Ngan Le

Abstract: Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modaliti… ▽ More Current video-language models (VLMs) rely extensively on instance-level alignment between video and language modalities, which presents two major limitations: (1) visual reasoning disobeys the natural perception that humans do in first-person perspective, leading to a lack of reasoning interpretation; and (2) learning is limited in capturing inherent fine-grained relationships between two modalities. In this paper, we take an inspiration from human perception and explore a compositional approach for egocentric video representation. We introduce HENASY (Hierarchical ENtities ASsemblY), which includes a spatiotemporal token grouping mechanism to explicitly assemble dynamically evolving scene entities through time and model their relationship for video representation. By leveraging compositional structure understanding, HENASY possesses strong interpretability via visual grounding with free-form text queries. We further explore a suite of multi-grained contrastive losses to facilitate entity-centric understandings. This comprises three alignment types: video-narration, noun-entity, verb-entities alignments. Our method demonstrates strong interpretability in both quantitative and qualitative experiments; while maintaining competitive performances on five downstream tasks via zero-shot transfer or as video/text representation, including video/text retrieval, action recognition, multi-choice query, natural language query, and moments query. △ Less

Submitted 6 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: under submission

arXiv:2405.12463 [pdf, other]

Stochastic Learning of Computational Resource Usage as Graph Structured Multimarginal Schrödinger Bridge

Authors: Georgiy A. Bondar, Robert Gifford, Linh Thi Xuan Phan, Abhishek Halder

Abstract: We propose to learn the time-varying stochastic computational resource usage of software as a graph structured Schrödinger bridge problem. In general, learning the computational resource usage from data is challenging because resources such as the number of CPU instructions and the number of last level cache requests are both time-varying and statistically correlated. Our proposed method enables l… ▽ More We propose to learn the time-varying stochastic computational resource usage of software as a graph structured Schrödinger bridge problem. In general, learning the computational resource usage from data is challenging because resources such as the number of CPU instructions and the number of last level cache requests are both time-varying and statistically correlated. Our proposed method enables learning the joint time-varying stochasticity in computational resource usage from the measured profile snapshots in a nonparametric manner. The method can be used to predict the most-likely time-varying distribution of computational resource availability at a desired time. We provide detailed algorithms for stochastic learning in both single and multi-core cases, discuss the convergence guarantees, computational complexities, and demonstrate their practical use in two case studies: a single-core nonlinear model predictive controller, and a synthetic multi-core software. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.13417 [pdf, other]

Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer

Authors: Quoc Khanh Nguyen, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Van Binh Truong, Tuong Phan, Hung Cao

Abstract: To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compar… ▽ More To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compared with other Region-based approaches, G-CAME significantly reduces explanation time to 0.5 seconds without compromising the quality. Our evaluation of G-CAME, using Faster-RCNN and YOLOX on the MS-COCO 2017 dataset, demonstrates its ability to offer highly plausible and faithful explanations, especially in reducing the bias on tiny object detection. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: Canadian AI 2024

arXiv:2404.03431 [pdf, other]

MEDIATE: Mutually Endorsed Distributed Incentive Acknowledgment Token Exchange

Authors: Philipp Altmann, Katharina Winter, Michael Kölle, Maximilian Zorn, Thomy Phan, Claudia Linnhoff-Popien

Abstract: Recent advances in multi-agent systems (MAS) have shown that incorporating peer incentivization (PI) mechanisms vastly improves cooperation. Especially in social dilemmas, communication between the agents helps to overcome sub-optimal Nash equilibria. However, incentivization tokens need to be carefully selected. Furthermore, real-world applications might yield increased privacy requirements and l… ▽ More Recent advances in multi-agent systems (MAS) have shown that incorporating peer incentivization (PI) mechanisms vastly improves cooperation. Especially in social dilemmas, communication between the agents helps to overcome sub-optimal Nash equilibria. However, incentivization tokens need to be carefully selected. Furthermore, real-world applications might yield increased privacy requirements and limited exchange. Therefore, we extend the PI protocol for mutual acknowledgment token exchange (MATE) and provide additional analysis on the impact of the chosen tokens. Building upon those insights, we propose mutually endorsed distributed incentive acknowledgment token exchange (MEDIATE), an extended PI architecture employing automatic token derivation via decentralized consensus. Empirical results show the stable agreement on appropriate tokens yielding superior performance compared to static tokens and state-of-the-art approaches in different social dilemma environments with various reward distributions. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 12 pages, 5 figures

arXiv:2404.01270 [pdf, other]

Decentralized Collaborative Learning Framework with External Privacy Leakage Analysis

Authors: Tsuyoshi Idé, Dzung T. Phan, Rudy Raymond

Abstract: This paper presents two methodological advancements in decentralized multi-task learning under privacy constraints, aiming to pave the way for future developments in next-generation Blockchain platforms. First, we expand the existing framework for collaborative dictionary learning (CollabDict), which has previously been limited to Gaussian mixture models, by incorporating deep variational autoenco… ▽ More This paper presents two methodological advancements in decentralized multi-task learning under privacy constraints, aiming to pave the way for future developments in next-generation Blockchain platforms. First, we expand the existing framework for collaborative dictionary learning (CollabDict), which has previously been limited to Gaussian mixture models, by incorporating deep variational autoencoders (VAEs) into the framework, with a particular focus on anomaly detection. We demonstrate that the VAE-based anomaly score function shares the same mathematical structure as the non-deep model, and provide comprehensive qualitative comparison. Second, considering the widespread use of "pre-trained models," we provide a mathematical analysis on data privacy leakage when models trained with CollabDict are shared externally. We show that the CollabDict approach, when applied to Gaussian mixtures, adheres to a Renyi differential privacy criterion. Additionally, we propose a practical metric for monitoring internal privacy breaches during the learning process. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: To appear in Proceeding of 2023 International workshop Blockchain Kaigi (BCK 23), JPS Conference Proceedings, 2024

arXiv:2403.08313 [pdf, other]

An improvement on the Louvain algorithm using random walks

Authors: Duy Hieu Do, Thi Ha Duong Phan

Abstract: We will present improvements to famous algorithms for community detection, namely Newman's spectral method algorithm and the Louvain algorithm. The Newman algorithm begins by treating the original graph as a single cluster, then repeats the process to split each cluster into two, based on the signs of the eigenvector corresponding to the secondlargest eigenvalue. Our improvement involves replacing… ▽ More We will present improvements to famous algorithms for community detection, namely Newman's spectral method algorithm and the Louvain algorithm. The Newman algorithm begins by treating the original graph as a single cluster, then repeats the process to split each cluster into two, based on the signs of the eigenvector corresponding to the secondlargest eigenvalue. Our improvement involves replacing the time-consuming computation of eigenvalues with a random walk during the splitting process. The Louvain algorithm iteratively performs the following steps until no increase in modularity can be achieved anymore: each step consists of two phases, phase 1 for partitioning the graph into clusters, and phase 2 for constructing a new graph where each vertex represents one cluster obtained from phase 1. We propose an improvement to this algorithm by adding our random walk algorithm as an additional phase for refining clusters obtained from phase 1. It maintains a complexity comparable to the Louvain algorithm while exhibiting superior efficiency. To validate the robustness and effectiveness of our proposed algorithms, we conducted experiments using randomly generated graphs and real-world data. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.03611 [pdf]

Comparison Performance of Spectrogram and Scalogram as Input of Acoustic Recognition Task

Authors: Dang Thoai Phan

Abstract: Acoustic recognition is a common task for deep learning in recent researches, with the employment of spectral feature extraction such as Short-time Fourier transform and Wavelet transform. However, not many researches have found that discuss the advantages and drawbacks, as well as performance comparison of them. In this consideration, this paper aims to comparing the attributes of these two trans… ▽ More Acoustic recognition is a common task for deep learning in recent researches, with the employment of spectral feature extraction such as Short-time Fourier transform and Wavelet transform. However, not many researches have found that discuss the advantages and drawbacks, as well as performance comparison of them. In this consideration, this paper aims to comparing the attributes of these two transforms, called spectrogram and scalogram. A Convolutional Neural Networks for acoustic faults recognition is implemented, then the performance of them is recorded for comparison. A latest research on the same audio database is considered for benchmarking to see how good the designed spectrogram and scalogram is. The advantages and limitations of them are also analyzed. By doing so, the results of this paper provide indications for application scenarios of spectrogram and scalogram, as well as potential further research directions. △ Less

Submitted 26 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.13804 [pdf, other]

Reconfigurable Intelligent Surfaces for THz: Hardware Impairments and Switching Technologies

Authors: Sérgio Matos, Yihan Ma, Qi Luo, Jonas Deuermeier, Luca Lucci, Panagiotis Gavriilidis, Asal Kiazadeh, Verónica Lain-Rubio, Tung D. Phan, Ping Jack Soh, Antonio Clemente, Luís M. Pessoa, George C. Alexandropoulos

Abstract: The demand for unprecedented performance in the upcoming 6G wireless networks is fomenting the research on THz communications empowered by Reconfigurable Inteligent Surfaces (RISs). A wide range of use cases have been proposed, most of them, assuming high-level RIS models that overlook some of the hardware impairments that this technology faces. The expectation is that the emergent reconfigurable… ▽ More The demand for unprecedented performance in the upcoming 6G wireless networks is fomenting the research on THz communications empowered by Reconfigurable Inteligent Surfaces (RISs). A wide range of use cases have been proposed, most of them, assuming high-level RIS models that overlook some of the hardware impairments that this technology faces. The expectation is that the emergent reconfigurable THz technologies will eventually overcome its current limitations. This disassociation from the hardware may mask nonphysical assumptions, perceived as hardware limitations. In this paper, a top-down approach bounded by physical constraints is presented, distilling from system-level specifications, hardware requirements, and upper bounds for the RIS-aided system performance. We consider D-band indoor and outdoor scenarios where a more realistic assessment of the state-of-the-art solution can be made. The goal is to highlight the intricacies of the design procedure based on sound assumptions for the RIS performance. For a given signal range and angular coverage, we quantify the required RIS size, number of switching elements, and maximum achievable bandwidth and capacity. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 6 pages, 6 figures, submitted for a conference presentation

arXiv:2402.02319 [pdf]

Smart Textile-Driven Soft Spine Exosuit for Lifting Tasks in Industrial Applications

Authors: Kefan Zhu, Bibhu Sharma, Phuoc Thien Phan, James Davies, Mai Thanh Thai, Trung Thien Hoang, Chi Cong Nguyen, Adrienne Ji, Emanuele Nicotra, Nigel H. Lovell, Thanh Nho Do

Abstract: Work related musculoskeletal disorders (WMSDs) are often caused by repetitive lifting, making them a significant concern in occupational health. Although wearable assist devices have become the norm for mitigating the risk of back pain, most spinal assist devices still possess a partially rigid structure that impacts the user comfort and flexibility. This paper addresses this issue by presenting a… ▽ More Work related musculoskeletal disorders (WMSDs) are often caused by repetitive lifting, making them a significant concern in occupational health. Although wearable assist devices have become the norm for mitigating the risk of back pain, most spinal assist devices still possess a partially rigid structure that impacts the user comfort and flexibility. This paper addresses this issue by presenting a smart textile actuated spine assistance robotic exosuit (SARE), which can conform to the back seamlessly without impeding the user movement and is incredibly lightweight. The SARE can assist the human erector spinae to complete any action with virtually infinite degrees of freedom. To detect the strain on the spine and to control the smart textile automatically, a soft knitting sensor which utilizes fluid pressure as sensing element is used. The new device is validated experimentally with human subjects where it reduces peak electromyography (EMG) signals of lumbar erector spinae by around 32 percent in loaded and around 22 percent in unloaded conditions. Moreover, the integrated EMG decreased by around 24.2 percent under loaded condition and around 23.6 percent under unloaded condition. In summary, the artificial muscle wearable device represents an anatomical solution to reduce the risk of muscle strain, metabolic energy cost and back pain associated with repetitive lifting tasks. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: 6 pages, 7 figures

arXiv:2402.01961 [pdf, other]

Anytime Multi-Agent Path Finding using Operation Parallelism in Large Neighborhood Search

Authors: Shao-Hung Chan, Zhe Chen, Dian-Lun Lin, Yue Zhang, Daniel Harabor, Tsung-Wei Huang, Sven Koenig, Thomy Phan

Abstract: Multi-Agent Path Finding (MAPF) is the problem of finding a set of collision-free paths for multiple agents in a shared environment while minimizing the sum of travel time. Since solving the MAPF problem optimally is NP-hard, anytime algorithms based on Large Neighborhood Search (LNS) are promising to find good-quality solutions in a scalable way by iteratively destroying and repairing the paths.… ▽ More Multi-Agent Path Finding (MAPF) is the problem of finding a set of collision-free paths for multiple agents in a shared environment while minimizing the sum of travel time. Since solving the MAPF problem optimally is NP-hard, anytime algorithms based on Large Neighborhood Search (LNS) are promising to find good-quality solutions in a scalable way by iteratively destroying and repairing the paths. We propose Destroy-Repair Operation Parallelism for LNS (DROP-LNS), a parallel framework that performs multiple destroy and repair operations concurrently to explore more regions of the search space within a limited time budget. Unlike classic MAPF approaches, DROP-LNS can exploit parallelized hardware to improve the solution quality. We also formulate two variants of parallelism and conduct experimental evaluations. The results show that DROP-LNS significantly outperforms the state-of-the-art and the variants. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted as an extended abstract in AAMAS 2024

arXiv:2401.07056 [pdf, other]

Aquarium: A Comprehensive Framework for Exploring Predator-Prey Dynamics through Multi-Agent Reinforcement Learning Algorithms

Authors: Michael Kölle, Yannick Erpelding, Fabian Ritz, Thomy Phan, Steffen Illium, Claudia Linnhoff-Popien

Abstract: Recent advances in Multi-Agent Reinforcement Learning have prompted the modeling of intricate interactions between agents in simulated environments. In particular, the predator-prey dynamics have captured substantial interest and various simulations been tailored to unique requirements. To prevent further time-intensive developments, we introduce Aquarium, a comprehensive Multi-Agent Reinforcement… ▽ More Recent advances in Multi-Agent Reinforcement Learning have prompted the modeling of intricate interactions between agents in simulated environments. In particular, the predator-prey dynamics have captured substantial interest and various simulations been tailored to unique requirements. To prevent further time-intensive developments, we introduce Aquarium, a comprehensive Multi-Agent Reinforcement Learning environment for predator-prey interaction, enabling the study of emergent behavior. Aquarium is open source and offers a seamless integration of the PettingZoo framework, allowing a quick start with proven algorithm implementations. It features physics-based agent movement on a two-dimensional, edge-wrapping plane. The agent-environment interaction (observations, actions, rewards) and the environment settings (agent speed, prey reproduction, predator starvation, and others) are fully customizable. Besides a resource-efficient visualization, Aquarium supports to record video files, providing a visual comprehension of agent behavior. To demonstrate the environment's capabilities, we conduct preliminary studies which use PPO to train multiple prey agents to evade a predator. In accordance to the literature, we find Individual Learning to result in worse performance than Parameter Sharing, which significantly improves coordination and sample-efficiency. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: Accepted at ICAART

arXiv:2401.05860 [pdf, other]

Confidence-Based Curriculum Learning for Multi-Agent Path Finding

Authors: Thomy Phan, Joseph Driscoll, Justin Romberg, Sven Koenig

Abstract: A wide range of real-world applications can be formulated as Multi-Agent Path Finding (MAPF) problem, where the goal is to find collision-free paths for multiple agents with individual start and goal locations. State-of-the-art MAPF solvers are mainly centralized and depend on global information, which limits their scalability and flexibility regarding changes or new maps that would require expens… ▽ More A wide range of real-world applications can be formulated as Multi-Agent Path Finding (MAPF) problem, where the goal is to find collision-free paths for multiple agents with individual start and goal locations. State-of-the-art MAPF solvers are mainly centralized and depend on global information, which limits their scalability and flexibility regarding changes or new maps that would require expensive replanning. Multi-agent reinforcement learning (MARL) offers an alternative way by learning decentralized policies that can generalize over a variety of maps. While there exist some prior works that attempt to connect both areas, the proposed techniques are heavily engineered and very complex due to the integration of many mechanisms that limit generality and are expensive to use. We argue that much simpler and general approaches are needed to bring the areas of MARL and MAPF closer together with significantly lower costs. In this paper, we propose Confidence-based Auto-Curriculum for Team Update Stability (CACTUS) as a lightweight MARL approach to MAPF. CACTUS defines a simple reverse curriculum scheme, where the goal of each agent is randomly placed within an allocation radius around the agent's start location. The allocation radius increases gradually as all agents improve, which is assessed by a confidence-based measure. We evaluate CACTUS in various maps of different sizes, obstacle densities, and numbers of agents. Our experiments demonstrate better performance and generalization capabilities than state-of-the-art MARL approaches with less than 600,000 trainable parameters, which is less than 5% of the neural network size of current MARL approaches to MAPF. △ Less

Submitted 10 February, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: Accepted to AAMAS 2024

arXiv:2401.05800 [pdf, other]

Graph Spatiotemporal Process for Multivariate Time Series Anomaly Detection with Missing Values

Authors: Yu Zheng, Huan Yee Koh, Ming Jin, Lianhua Chi, Haishuai Wang, Khoa T. Phan, Yi-Ping Phoebe Chen, Shirui Pan, Wei Xiang

Abstract: The detection of anomalies in multivariate time series data is crucial for various practical applications, including smart power grids, traffic flow forecasting, and industrial process control. However, real-world time series data is usually not well-structured, posting significant challenges to existing approaches: (1) The existence of missing values in multivariate time series data along variabl… ▽ More The detection of anomalies in multivariate time series data is crucial for various practical applications, including smart power grids, traffic flow forecasting, and industrial process control. However, real-world time series data is usually not well-structured, posting significant challenges to existing approaches: (1) The existence of missing values in multivariate time series data along variable and time dimensions hinders the effective modeling of interwoven spatial and temporal dependencies, resulting in important patterns being overlooked during model training; (2) Anomaly scoring with irregularly-sampled observations is less explored, making it difficult to use existing detectors for multivariate series without fully-observed values. In this work, we introduce a novel framework called GST-Pro, which utilizes a graph spatiotemporal process and anomaly scorer to tackle the aforementioned challenges in detecting anomalies on irregularly-sampled multivariate time series. Our approach comprises two main components. First, we propose a graph spatiotemporal process based on neural controlled differential equations. This process enables effective modeling of multivariate time series from both spatial and temporal perspectives, even when the data contains missing values. Second, we present a novel distribution-based anomaly scoring mechanism that alleviates the reliance on complete uniform observations. By analyzing the predictions of the graph spatiotemporal process, our approach allows anomalies to be easily detected. Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods, regardless of whether there are missing values present in the data. Our code is available: https://github.com/huankoh/GST-Pro. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: Accepted by Information Fusion

arXiv:2401.03504 [pdf, other]

ClusterComm: Discrete Communication in Decentralized MARL using Internal Representation Clustering

Authors: Robert Müller, Hasan Turalic, Thomy Phan, Michael Kölle, Jonas Nüßlein, Claudia Linnhoff-Popien

Abstract: In the realm of Multi-Agent Reinforcement Learning (MARL), prevailing approaches exhibit shortcomings in aligning with human learning, robustness, and scalability. Addressing this, we introduce ClusterComm, a fully decentralized MARL framework where agents communicate discretely without a central control unit. ClusterComm utilizes Mini-Batch-K-Means clustering on the last hidden layer's activation… ▽ More In the realm of Multi-Agent Reinforcement Learning (MARL), prevailing approaches exhibit shortcomings in aligning with human learning, robustness, and scalability. Addressing this, we introduce ClusterComm, a fully decentralized MARL framework where agents communicate discretely without a central control unit. ClusterComm utilizes Mini-Batch-K-Means clustering on the last hidden layer's activations of an agent's policy network, translating them into discrete messages. This approach outperforms no communication and competes favorably with unbounded, continuous communication and hence poses a simple yet effective strategy for enhancing collaborative task-solving in MARL. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: Accepted at ICAART 2024

arXiv:2312.16767 [pdf, other]

Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search

Authors: Thomy Phan, Taoan Huang, Bistra Dilkina, Sven Koenig

Abstract: Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in large-scale multi-agent systems. State-of-the-art anytime MAPF is based on Large Neighborhood Search (LNS), where a fast initial solution is iteratively optimized by destroying and repairing a fixed number of parts, i.e., the neighborhood, of the solution, using randomized destroy heuristics and priori… ▽ More Anytime multi-agent path finding (MAPF) is a promising approach to scalable path optimization in large-scale multi-agent systems. State-of-the-art anytime MAPF is based on Large Neighborhood Search (LNS), where a fast initial solution is iteratively optimized by destroying and repairing a fixed number of parts, i.e., the neighborhood, of the solution, using randomized destroy heuristics and prioritized planning. Despite their recent success in various MAPF instances, current LNS-based approaches lack exploration and flexibility due to greedy optimization with a fixed neighborhood size which can lead to low quality solutions in general. So far, these limitations have been addressed with extensive prior effort in tuning or offline machine learning beyond actual planning. In this paper, we focus on online learning in LNS and propose Bandit-based Adaptive LArge Neighborhood search Combined with Exploration (BALANCE). BALANCE uses a bi-level multi-armed bandit scheme to adapt the selection of destroy heuristics and neighborhood sizes on the fly during search. We evaluate BALANCE on multiple maps from the MAPF benchmark set and empirically demonstrate cost improvements of at least 50% compared to state-of-the-art anytime MAPF in large-scale scenarios. We find that Thompson Sampling performs particularly well compared to alternative multi-armed bandit algorithms. △ Less

Submitted 1 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.11337 [pdf, other]

Challenges for Reinforcement Learning in Quantum Circuit Design

Authors: Philipp Altmann, Jonas Stein, Michael Kölle, Adelina Bärligea, Thomas Gabor, Thomy Phan, Sebastian Feld, Claudia Linnhoff-Popien

Abstract: Quantum computing (QC) in the current NISQ era is still limited in size and precision. Hybrid applications mitigating those shortcomings are prevalent to gain early insight and advantages. Hybrid quantum machine learning (QML) comprises both the application of QC to improve machine learning (ML) and ML to improve QC architectures. This work considers the latter, leveraging reinforcement learning (… ▽ More Quantum computing (QC) in the current NISQ era is still limited in size and precision. Hybrid applications mitigating those shortcomings are prevalent to gain early insight and advantages. Hybrid quantum machine learning (QML) comprises both the application of QC to improve machine learning (ML) and ML to improve QC architectures. This work considers the latter, leveraging reinforcement learning (RL) to improve the search for viable quantum architectures, which we formalize by a set of generic challenges. Furthermore, we propose a concrete framework, formalized as a Markov decision process, to enable learning policies capable of controlling a universal set of continuously parameterized quantum gates. Finally, we provide benchmark comparisons to assess the shortcomings and strengths of current state-of-the-art RL algorithms. △ Less

Submitted 4 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 10 pages, 3 figures

arXiv:2312.10187 [pdf, other]

TSRNet: Simple Framework for Real-time ECG Anomaly Detection with Multimodal Time and Spectrogram Restoration Network

Authors: Nhat-Tan Bui, Dinh-Hieu Hoang, Thinh Phan, Minh-Triet Tran, Brijesh Patel, Donald Adjeroh, Ngan Le

Abstract: The electrocardiogram (ECG) is a valuable signal used to assess various aspects of heart health, such as heart rate and rhythm. It plays a crucial role in identifying cardiac conditions and detecting anomalies in ECG data. However, distinguishing between normal and abnormal ECG signals can be a challenging task. In this paper, we propose an approach that leverages anomaly detection to identify unh… ▽ More The electrocardiogram (ECG) is a valuable signal used to assess various aspects of heart health, such as heart rate and rhythm. It plays a crucial role in identifying cardiac conditions and detecting anomalies in ECG data. However, distinguishing between normal and abnormal ECG signals can be a challenging task. In this paper, we propose an approach that leverages anomaly detection to identify unhealthy conditions using solely normal ECG data for training. Furthermore, to enhance the information available and build a robust system, we suggest considering both the time series and time-frequency domain aspects of the ECG signal. As a result, we introduce a specialized network called the Multimodal Time and Spectrogram Restoration Network (TSRNet) designed specifically for detecting anomalies in ECG signals. TSRNet falls into the category of restoration-based anomaly detection and draws inspiration from both the time series and spectrogram domains. By extracting representations from both domains, TSRNet effectively captures the comprehensive characteristics of the ECG signal. This approach enables the network to learn robust representations with superior discrimination abilities, allowing it to distinguish between normal and abnormal ECG patterns more effectively. Furthermore, we introduce a novel inference method, termed Peak-based Error, that specifically focuses on ECG peaks, a critical component in detecting abnormalities. The experimental result on the large-scale dataset PTB-XL has demonstrated the effectiveness of our approach in ECG anomaly detection, while also prioritizing efficiency by minimizing the number of trainable parameters. Our code is available at https://github.com/UARK-AICV/TSRNet. △ Less

Submitted 5 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted at ISBI 2024

arXiv:2312.07831 [pdf, other]

doi 10.1145/3628797.3628921

Abusive Span Detection for Vietnamese Narrative Texts

Authors: Nhu-Thanh Nguyen, Khoa Thi-Kim Phan, Duc-Vu Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Abuse in its various forms, including physical, psychological, verbal, sexual, financial, and cultural, has a negative impact on mental health. However, there are limited studies on applying natural language processing (NLP) in this field in Vietnam. Therefore, we aim to contribute by building a human-annotated Vietnamese dataset for detecting abusive content in Vietnamese narrative texts. We sour… ▽ More Abuse in its various forms, including physical, psychological, verbal, sexual, financial, and cultural, has a negative impact on mental health. However, there are limited studies on applying natural language processing (NLP) in this field in Vietnam. Therefore, we aim to contribute by building a human-annotated Vietnamese dataset for detecting abusive content in Vietnamese narrative texts. We sourced these texts from VnExpress, Vietnam's popular online newspaper, where readers often share stories containing abusive content. Identifying and categorizing abusive spans in these texts posed significant challenges during dataset creation, but it also motivated our research. We experimented with lightweight baseline models by freezing PhoBERT and XLM-RoBERTa and using their hidden states in a BiLSTM to assess the complexity of the dataset. According to our experimental results, PhoBERT outperforms other models in both labeled and unlabeled abusive span detection tasks. These results indicate that it has the potential for future improvements. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted at SoICT 2023

arXiv:2311.05546 [pdf, other]

Multi-Agent Quantum Reinforcement Learning using Evolutionary Optimization

Authors: Michael Kölle, Felix Topp, Thomy Phan, Philipp Altmann, Jonas Nüßlein, Claudia Linnhoff-Popien

Abstract: Multi-Agent Reinforcement Learning is becoming increasingly more important in times of autonomous driving and other smart industrial applications. Simultaneously a promising new approach to Reinforcement Learning arises using the inherent properties of quantum mechanics, reducing the trainable parameters of a model significantly. However, gradient-based Multi-Agent Quantum Reinforcement Learning m… ▽ More Multi-Agent Reinforcement Learning is becoming increasingly more important in times of autonomous driving and other smart industrial applications. Simultaneously a promising new approach to Reinforcement Learning arises using the inherent properties of quantum mechanics, reducing the trainable parameters of a model significantly. However, gradient-based Multi-Agent Quantum Reinforcement Learning methods often have to struggle with barren plateaus, holding them back from matching the performance of classical approaches. We build upon an existing approach for gradient free Quantum Reinforcement Learning and propose three genetic variations with Variational Quantum Circuits for Multi-Agent Reinforcement Learning using evolutionary optimization. We evaluate our genetic variations in the Coin Game environment and also compare them to classical approaches. We showed that our Variational Quantum Circuit approaches perform significantly better compared to a neural network with a similar amount of trainable parameters. Compared to the larger neural network, our approaches archive similar results using $97.88\%$ less parameters. △ Less

Submitted 13 January, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.00729 [pdf, other]

ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection

Authors: Thinh Phan, Khoa Vo, Duy Le, Gianfranco Doretto, Donald Adjeroh, Ngan Le

Abstract: Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot T… ▽ More Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot TAD methods have limitations on how to properly construct the strong relationship between two interdependent tasks of localization and classification and adapt ViL model to video understanding. In this work, we present ZEETAD, featuring two modules: dual-localization and zero-shot proposal classification. The former is a Transformer-based module that detects action events while selectively collecting crucial semantic embeddings for later recognition. The latter one, CLIP-based module, generates semantic embeddings from text and frame inputs for each temporal unit. Additionally, we enhance discriminative capability on unseen classes by minimally updating the frozen CLIP encoder with lightweight adapters. Extensive experiments on THUMOS14 and ActivityNet-1.3 datasets demonstrate our approach's superior performance in zero-shot TAD and effective knowledge transfer from ViL models to unseen action categories. △ Less

Submitted 4 November, 2023; v1 submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.14434 [pdf, other]

Enhancing Accuracy-Privacy Trade-off in Differentially Private Split Learning

Authors: Ngoc Duy Pham, Khoa Tran Phan, Naveen Chilamkurti

Abstract: Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. Only processed or `smashed' data can be transmitted from the clients to the server during the SL process. However, recently proposed model inversion attacks can recover the original data from the smashed data. In order to enhance privacy protection against such… ▽ More Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. Only processed or `smashed' data can be transmitted from the clients to the server during the SL process. However, recently proposed model inversion attacks can recover the original data from the smashed data. In order to enhance privacy protection against such attacks, a strategy is to adopt differential privacy (DP), which involves safeguarding the smashed data at the expense of some accuracy loss. This paper presents the first investigation into the impact on accuracy when training multiple clients in SL with various privacy requirements. Subsequently, we propose an approach that reviews the DP noise distributions of other clients during client training to address the identified accuracy degradation. We also examine the application of DP to the local model of SL to gain insights into the trade-off between accuracy and privacy. Specifically, findings reveal that introducing noise in the later local layers offers the most favorable balance between accuracy and privacy. Drawing from our insights in the shallower layers, we propose an approach to reduce the size of smashed data to minimize data leakage while maintaining higher accuracy, optimizing the accuracy-privacy trade-off. Additionally, a smaller size of smashed data reduces communication overhead on the client side, mitigating one of the notable drawbacks of SL. Experiments with popular datasets demonstrate that our proposed approaches provide an optimal trade-off for incorporating DP into SL, ultimately enhancing training accuracy for multi-client SL with varying privacy requirements. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.11166 [pdf, other]

ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing

Authors: Quoc-Nam Nguyen, Thang Chau Phan, Duc-Vu Nguyen, Kiet Van Nguyen

Abstract: English and Chinese, known as resource-rich languages, have witnessed the strong development of transformer-based language models for natural language processing tasks. Although Vietnam has approximately 100M people speaking Vietnamese, several pre-trained models, e.g., PhoBERT, ViBERT, and vELECTRA, performed well on general Vietnamese NLP tasks, including POS tagging and named entity recognition… ▽ More English and Chinese, known as resource-rich languages, have witnessed the strong development of transformer-based language models for natural language processing tasks. Although Vietnam has approximately 100M people speaking Vietnamese, several pre-trained models, e.g., PhoBERT, ViBERT, and vELECTRA, performed well on general Vietnamese NLP tasks, including POS tagging and named entity recognition. These pre-trained language models are still limited to Vietnamese social media tasks. In this paper, we present the first monolingual pre-trained language model for Vietnamese social media texts, ViSoBERT, which is pre-trained on a large-scale corpus of high-quality and diverse Vietnamese social media texts using XLM-R architecture. Moreover, we explored our pre-trained model on five important natural language downstream tasks on Vietnamese social media texts: emotion recognition, hate speech detection, sentiment analysis, spam reviews detection, and hate speech spans detection. Our experiments demonstrate that ViSoBERT, with far fewer parameters, surpasses the previous state-of-the-art models on multiple Vietnamese social media tasks. Our ViSoBERT model is available only for research purposes. △ Less

Submitted 28 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP'2023 Main Conference

arXiv:2310.00604 [pdf, other]

Path Structured Multimarginal Schrödinger Bridge for Probabilistic Learning of Hardware Resource Usage by Control Software

Authors: Georgiy A. Bondar, Robert Gifford, Linh Thi Xuan Phan, Abhishek Halder

Abstract: The solution of the path structured multimarginal Schrödinger bridge problem (MSBP) is the most-likely measure-valued trajectory consistent with a sequence of observed probability measures or distributional snapshots. We leverage recent algorithmic advances in solving such structured MSBPs for learning stochastic hardware resource usage by control software. The solution enables predicting the time… ▽ More The solution of the path structured multimarginal Schrödinger bridge problem (MSBP) is the most-likely measure-valued trajectory consistent with a sequence of observed probability measures or distributional snapshots. We leverage recent algorithmic advances in solving such structured MSBPs for learning stochastic hardware resource usage by control software. The solution enables predicting the time-varying distribution of hardware resource availability at a desired time with guaranteed linear convergence. We demonstrate the efficacy of our probabilistic learning approach in a model predictive control software execution case study. The method exhibits rapid convergence to an accurate prediction of hardware resource utilization of the controller. The method can be broadly applied to any software to predict cyber-physical context-dependent performance at arbitrary time. △ Less

Submitted 3 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: 8 pages, 6 figures. Submitted to American Control Conference (ACC) 2024

arXiv:2308.10121 [pdf, other]

Dronevision: An Experimental 3D Testbed for Flying Light Specks

Authors: Hamed Alimohammadzadeh, Rohit Bernard, Yang Chen, Trung Phan, Prashant Singh, Shuqin Zhu, Heather Culbertson, Shahram Ghandeharizadeh

Abstract: Today's robotic laboratories for drones are housed in a large room. At times, they are the size of a warehouse. These spaces are typically equipped with permanent devices to localize the drones, e.g., Vicon Infrared cameras. Significant time is invested to fine-tune the localization apparatus to compute and control the position of the drones. One may use these laboratories to develop a 3D multimed… ▽ More Today's robotic laboratories for drones are housed in a large room. At times, they are the size of a warehouse. These spaces are typically equipped with permanent devices to localize the drones, e.g., Vicon Infrared cameras. Significant time is invested to fine-tune the localization apparatus to compute and control the position of the drones. One may use these laboratories to develop a 3D multimedia system with miniature sized drones configured with light sources. As an alternative, this brave new idea paper envisions shrinking these room-sized laboratories to the size of a cube or cuboid that sits on a desk and costs less than 10K dollars. The resulting Dronevision (DV) will be the size of a 1990s Television. In addition to light sources, its Flying Light Specks (FLSs) will be network-enabled drones with storage and processing capability to implement decentralized algorithms. The DV will include a localization technique to expedite development of 3D displays. It will act as a haptic interface for a user to interact with and manipulate the 3D virtual illuminations. It will empower an experimenter to design, implement, test, debug, and maintain software and hardware that realize novel algorithms in the comfort of their office without having to reserve a laboratory. In addition to enhancing productivity, it will improve safety of the experimenter by minimizing the likelihood of accidents. This paper introduces the concept of a DV, the research agenda one may pursue using this device, and our plans to realize one. △ Less

Submitted 19 August, 2023; originally announced August 2023.

arXiv:2308.10115 [pdf, other]

An Evaluation of Three Distance Measurement Technologies for Flying Light Specks

Authors: Trung Phan, Hamed Alimohammadzadeh, Heather Culbertson, Shahram Ghandeharizadeh

Abstract: This study evaluates the accuracy of three different types of time-of-flight sensors to measure distance. We envision the possible use of these sensors to localize swarms of flying light specks (FLSs) to illuminate objects and avatars of a metaverse. An FLS is a miniature-sized drone configured with RGB light sources. It is unable to illuminate a point cloud by itself. However, the inter-FLS relat… ▽ More This study evaluates the accuracy of three different types of time-of-flight sensors to measure distance. We envision the possible use of these sensors to localize swarms of flying light specks (FLSs) to illuminate objects and avatars of a metaverse. An FLS is a miniature-sized drone configured with RGB light sources. It is unable to illuminate a point cloud by itself. However, the inter-FLS relationship effect of an organizational framework will compensate for the simplicity of each individual FLS, enabling a swarm of cooperating FLSs to illuminate complex shapes and render haptic interactions. Distance between FLSs is an important criterion of the inter-FLS relationship. We consider sensors that use radio frequency (UWB), infrared light (IR), and sound (ultrasonic) to quantify this metric. Obtained results show only one sensor is able to measure distances as small as 1 cm with a high accuracy. A sensor may require a calibration process that impacts its accuracy in measuring distance. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: In International Conference on Intelligent Metaverse Technologies and Applications (iMETA2023), Tartu, Estonia, September 18-20, 2023

arXiv:2308.02242 [pdf, ps, other]

Countering Eavesdroppers with Meta-learning-based Cooperative Ambient Backscatter Communications

Authors: Nam H. Chu, Nguyen Van Huynh, Diep N. Nguyen, Dinh Thai Hoang, Shimin Gong, Tao Shu, Eryk Dutkiewicz, Khoa T. Phan

Abstract: This article introduces a novel lightweight framework using ambient backscattering communications to counter eavesdroppers. In particular, our framework divides an original message into two parts: (i) the active-transmit message transmitted by the transmitter using conventional RF signals and (ii) the backscatter message transmitted by an ambient backscatter tag that backscatters upon the active s… ▽ More This article introduces a novel lightweight framework using ambient backscattering communications to counter eavesdroppers. In particular, our framework divides an original message into two parts: (i) the active-transmit message transmitted by the transmitter using conventional RF signals and (ii) the backscatter message transmitted by an ambient backscatter tag that backscatters upon the active signals emitted by the transmitter. Notably, the backscatter tag does not generate its own signal, making it difficult for an eavesdropper to detect the backscattered signals unless they have prior knowledge of the system. Here, we assume that without decoding/knowing the backscatter message, the eavesdropper is unable to decode the original message. Even in scenarios where the eavesdropper can capture both messages, reconstructing the original message is a complex task without understanding the intricacies of the message-splitting mechanism. A challenge in our proposed framework is to effectively decode the backscattered signals at the receiver, often accomplished using the maximum likelihood (MLK) approach. However, such a method may require a complex mathematical model together with perfect channel state information (CSI). To address this issue, we develop a novel deep meta-learning-based signal detector that can not only effectively decode the weak backscattered signals without requiring perfect CSI but also quickly adapt to a new wireless environment with very little knowledge. Simulation results show that our proposed learning approach, without requiring perfect CSI and complex mathematical model, can achieve a bit error ratio close to that of the MLK-based approach. They also clearly show the efficiency of the proposed approach in dealing with eavesdropping attacks and the lack of training data for deep learning models in practical scenarios. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.08390 [pdf, other]

doi 10.1109/TNNLS.2023.3325667

Correlation-aware Spatial-Temporal Graph Learning for Multivariate Time-series Anomaly Detection

Authors: Yu Zheng, Huan Yee Koh, Ming Jin, Lianhua Chi, Khoa T. Phan, Shirui Pan, Yi-Ping Phoebe Chen, Wei Xiang

Abstract: Multivariate time-series anomaly detection is critically important in many applications, including retail, transportation, power grid, and water treatment plants. Existing approaches for this problem mostly employ either statistical models which cannot capture the non-linear relations well or conventional deep learning models (e.g., CNN and LSTM) that do not explicitly learn the pairwise correlati… ▽ More Multivariate time-series anomaly detection is critically important in many applications, including retail, transportation, power grid, and water treatment plants. Existing approaches for this problem mostly employ either statistical models which cannot capture the non-linear relations well or conventional deep learning models (e.g., CNN and LSTM) that do not explicitly learn the pairwise correlations among variables. To overcome these limitations, we propose a novel method, correlation-aware spatial-temporal graph learning (termed CST-GL), for time series anomaly detection. CST-GL explicitly captures the pairwise correlations via a multivariate time series correlation learning module based on which a spatial-temporal graph neural network (STGNN) can be developed. Then, by employing a graph convolution network that exploits one- and multi-hop neighbor information, our STGNN component can encode rich spatial information from complex pairwise dependencies between variables. With a temporal module that consists of dilated convolutional functions, the STGNN can further capture long-range dependence over time. A novel anomaly scoring component is further integrated into CST-GL to estimate the degree of an anomaly in a purely unsupervised manner. Experimental results demonstrate that CST-GL can detect anomalies effectively in general settings as well as enable early detection across different time delays. △ Less

Submitted 16 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 17 pages, double columns, 10 tables, 3 figures. Accepted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

arXiv:2306.11287 [pdf]

Spatiotemporal Pyramidal CNN with Depth-Wise Separable Convolution for Eye Blinking Detection in the Wild

Authors: Lan Anh Thi Nguy, Bach Nguyen Gia, Thanh Tu Thi Nguyen, Kamioka Eiji, Tan Xuan Phan

Abstract: Eye blinking detection in the wild plays an essential role in deception detection, driving fatigue detection, etc. Despite the fact that numerous attempts have already been made, the majority of them have encountered difficulties, such as the derived eye images having different resolutions as the distance between the face and the camera changes; or the requirement of a lightweight detection model… ▽ More Eye blinking detection in the wild plays an essential role in deception detection, driving fatigue detection, etc. Despite the fact that numerous attempts have already been made, the majority of them have encountered difficulties, such as the derived eye images having different resolutions as the distance between the face and the camera changes; or the requirement of a lightweight detection model to obtain a short inference time in order to perform in real-time. In this research, two problems are addressed: how the eye blinking detection model can learn efficiently from different resolutions of eye pictures in diverse conditions; and how to reduce the size of the detection model for faster inference time. We propose to utilize upsampling and downsampling the input eye images to the same resolution as one potential solution for the first problem, then find out which interpolation method can result in the highest performance of the detection model. For the second problem, although a recent spatiotemporal convolutional neural network used for eye blinking detection has a strong capacity to extract both spatial and temporal characteristics, it remains having a high number of network parameters, leading to high inference time. Therefore, using Depth-wise Separable Convolution rather than conventional convolution layers inside each branch is considered in this paper as a feasible solution. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.05340 [pdf]

Research Impact of Solar Panel Cleaning Robot on Photovoltaic Panel's Deflection

Authors: Trung Dat Phan, Minh Duc Nguyen, Maxence Auffray, Nhut Thang Le, Cong Toai Truong, Van Tu Duong, Huy Hung Nguyen, Tan Tien Nguyen

Abstract: In the last few decades, solar panel cleaning robots (SPCR) have been widely used for sanitizing photovoltaic (PV) panels as an effective solution for ensuring PV efficiency. However, the dynamic load generated by the SPCR during operation might have a negative impact on PV panels. To reduce these effects, this paper presents the utilization of ANSYS software to simulate multiple scenarios involvi… ▽ More In the last few decades, solar panel cleaning robots (SPCR) have been widely used for sanitizing photovoltaic (PV) panels as an effective solution for ensuring PV efficiency. However, the dynamic load generated by the SPCR during operation might have a negative impact on PV panels. To reduce these effects, this paper presents the utilization of ANSYS software to simulate multiple scenarios involving the impact of SPCR on PV panels. The simulation scenarios provided in the paper are derived from the typical movements of SPCR observed during practical operations. The simulation results show the deformation process of PV panels, and a second-order polynomial is established to describe the deformed amplitude along the centerline of PV panels. This second-order polynomial contributes to the design process of a damper system for SPCR aiming to reduce the influence of SPCR on PV panels. Moreover, the experiments are conducted to examine the correlation between the results of the simulation and the experiment. △ Less

Submitted 8 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 8 pages, 8 figures, The 4th International Conference on Applied Convergence Engineering (ICACE 2023)

arXiv:2305.17648 [pdf, other]

Z-GMOT: Zero-shot Generic Multiple Object Tracking

Authors: Kim Hoang Tran, Anh Duy Le Dinh, Tien Phat Nguyen, Thinh Phan, Pha Nguyen, Khoa Luu, Donald Adjeroh, Gianfranco Doretto, Ngan Hoang Le

Abstract: Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach, requiring less prior information. However, current GMOT methods often rely on initial bounding boxes and struggle t… ▽ More Despite recent significant progress, Multi-Object Tracking (MOT) faces limitations such as reliance on prior knowledge and predefined categories and struggles with unseen objects. To address these issues, Generic Multiple Object Tracking (GMOT) has emerged as an alternative approach, requiring less prior information. However, current GMOT methods often rely on initial bounding boxes and struggle to handle variations in factors such as viewpoint, lighting, occlusion, and scale, among others. Our contributions commence with the introduction of the \textit{Referring GMOT dataset} a collection of videos, each accompanied by detailed textual descriptions of their attributes. Subsequently, we propose $\mathtt{Z-GMOT}$, a cutting-edge tracking solution capable of tracking objects from \textit{never-seen categories} without the need of initial bounding boxes or predefined categories. Within our $\mathtt{Z-GMOT}$ framework, we introduce two novel components: (i) $\mathtt{iGLIP}$, an improved Grounded language-image pretraining, for accurately detecting unseen objects with specific characteristics. (ii) $\mathtt{MA-SORT}$, a novel object association approach that adeptly integrates motion and appearance-based matching strategies to tackle the complex task of tracking objects with high similarity. Our contributions are benchmarked through extensive experiments conducted on the Referring GMOT dataset for GMOT task. Additionally, to assess the generalizability of the proposed $\mathtt{Z-GMOT}$, we conduct ablation studies on the DanceTrack and MOT20 datasets for the MOT task. Our dataset, code, and models are released at: https://fsoft-aic.github.io/Z-GMOT. △ Less

Submitted 13 June, 2024; v1 submitted 28 May, 2023; originally announced May 2023.

arXiv:2304.13616 [pdf, other]

doi 10.24963/ijcai.2023/380

CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing

Authors: Philipp Altmann, Fabian Ritz, Leonard Feuchtinger, Jonas Nüßlein, Claudia Linnhoff-Popien, Thomy Phan

Abstract: The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment… ▽ More The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes. △ Less

Submitted 5 December, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: 9 pages, 5 figures, published at IJCAI 2023

arXiv:2302.13445 [pdf, ps, other]

Dynamic Resource Allocation for Metaverse Applications with Deep Reinforcement Learning

Authors: Nam H. Chu, Diep N. Nguyen, Dinh Thai Hoang, Khoa T. Phan, Eryk Dutkiewicz, Dusit Niyato, Tao Shu

Abstract: This work proposes a novel framework to dynamically and effectively manage and allocate different types of resources for Metaverse applications, which are forecasted to demand massive resources of various types that have never been seen before. Specifically, by studying functions of Metaverse applications, we first propose an effective solution to divide applications into groups, namely MetaInstan… ▽ More This work proposes a novel framework to dynamically and effectively manage and allocate different types of resources for Metaverse applications, which are forecasted to demand massive resources of various types that have never been seen before. Specifically, by studying functions of Metaverse applications, we first propose an effective solution to divide applications into groups, namely MetaInstances, where common functions can be shared among applications to enhance resource usage efficiency. Then, to capture the real-time, dynamic, and uncertain characteristics of request arrival and application departure processes, we develop a semi-Markov decision process-based framework and propose an intelligent algorithm that can gradually learn the optimal admission policy to maximize the revenue and resource usage efficiency for the Metaverse service provider and at the same time enhance the Quality-of-Service for Metaverse users. Extensive simulation results show that our proposed approach can achieve up to 120% greater revenue for the Metaverse service providers and up to 178.9% higher acceptance probability for Metaverse application requests than those of other baselines. △ Less

Submitted 26 February, 2023; originally announced February 2023.

Comments: To be published in the Proceedings of the IEEE WCNC 2023

arXiv:2301.07421 [pdf, other]

DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training

Authors: Philipp Altmann, Thomy Phan, Fabian Ritz, Thomas Gabor, Claudia Linnhoff-Popien

Abstract: We propose discriminative reward co-training (DIRECT) as an extension to deep reinforcement learning algorithms. Building upon the concept of self-imitation learning (SIL), we introduce an imitation buffer to store beneficial trajectories generated by the policy determined by their return. A discriminator network is trained concurrently to the policy to distinguish between trajectories generated b… ▽ More We propose discriminative reward co-training (DIRECT) as an extension to deep reinforcement learning algorithms. Building upon the concept of self-imitation learning (SIL), we introduce an imitation buffer to store beneficial trajectories generated by the policy determined by their return. A discriminator network is trained concurrently to the policy to distinguish between trajectories generated by the current policy and beneficial trajectories generated by previous policies. The discriminator's verdict is used to construct a reward signal for optimizing the policy. By interpolating prior experience, DIRECT is able to act as a surrogate, steering policy optimization towards more valuable regions of the reward landscape thus learning an optimal policy. Our results show that DIRECT outperforms state-of-the-art algorithms in sparse- and shifting-reward environments being able to provide a surrogate reward to the policy and direct the optimization towards valuable areas. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 9 pages, 10 figures, under review

ACM Class: I.2.6

arXiv:2301.01649 [pdf, other]

Attention-Based Recurrence for Multi-Agent Reinforcement Learning under Stochastic Partial Observability

Authors: Thomy Phan, Fabian Ritz, Philipp Altmann, Maximilian Zorn, Jonas Nüßlein, Michael Kölle, Thomas Gabor, Claudia Linnhoff-Popien

Abstract: Stochastic partial observability poses a major challenge for decentralized coordination in multi-agent reinforcement learning but is largely neglected in state-of-the-art research due to a strong focus on state-based centralized training for decentralized execution (CTDE) and benchmarks that lack sufficient stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we propose Attent… ▽ More Stochastic partial observability poses a major challenge for decentralized coordination in multi-agent reinforcement learning but is largely neglected in state-of-the-art research due to a strong focus on state-based centralized training for decentralized execution (CTDE) and benchmarks that lack sufficient stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we propose Attention-based Embeddings of Recurrence In multi-Agent Learning (AERIAL) to approximate value functions under stochastic partial observability. AERIAL replaces the true state with a learned representation of multi-agent recurrence, considering more accurate information about decentralized agent decisions than state-based CTDE. We then introduce MessySMAC, a modified version of SMAC with stochastic observations and higher variance in initial states, to provide a more general and configurable benchmark regarding stochastic partial observability. We evaluate AERIAL in Dec-Tiger as well as in a variety of SMAC and MessySMAC maps, and compare the results with state-based CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE against various stochasticity configurations in MessySMAC. △ Less

Submitted 27 December, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: Accepted to ICML 2023

arXiv:2212.00250 [pdf, other]

Split Learning without Local Weight Sharing to Enhance Client-side Data Privacy

Authors: Ngoc Duy Pham, Tran Khoa Phan, Alsharif Abuadbba, Yansong Gao, Doan Nguyen, Naveen Chilamkurti

Abstract: Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. In SL training with multiple clients, the local model weights are shared among the clients for local model update. This paper first reveals data privacy leakage exacerbated from local weight sharing among the clients in SL through model inversion attacks. Then,… ▽ More Split learning (SL) aims to protect user data privacy by distributing deep models between client-server and keeping private data locally. In SL training with multiple clients, the local model weights are shared among the clients for local model update. This paper first reveals data privacy leakage exacerbated from local weight sharing among the clients in SL through model inversion attacks. Then, to reduce the data privacy leakage issue, we propose and analyze privacy-enhanced SL (P-SL) (or SL without local weight sharing). We further propose parallelized P-SL to expedite the training process by duplicating multiple server-side model instances without compromising accuracy. Finally, we explore P-SL with late participating clients and devise a server-side cache-based training method to address the forgetting phenomenon in SL when late clients join. Experimental results demonstrate that P-SL helps reduce up to 50% of client-side data leakage, which essentially achieves a better privacy-accuracy trade-off than the current trend by using differential privacy mechanisms. Moreover, P-SL and its cache-based version achieve comparable accuracy to baseline SL under various data distributions, while cost less computation and communication. Additionally, caching-based training in P-SL mitigates the negative effect of forgetting, stabilizes the learning, and enables practical and low-complexity training in a dynamic environment with late-arriving clients. △ Less

Submitted 20 July, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

arXiv:2211.06323 [pdf, other]

A Sign That Spells: DALL-E 2, Invisual Images and The Racial Politics of Feature Space

Authors: Fabian Offert, Thao Phan

Abstract: In this paper, we examine how generative machine learning systems produce a new politics of visual culture. We focus on DALL-E 2 and related models as an emergent approach to image-making that operates through the cultural techniques of feature extraction and semantic compression. These techniques, we argue, are inhuman, invisual, and opaque, yet are still caught in a paradox that is ironically al… ▽ More In this paper, we examine how generative machine learning systems produce a new politics of visual culture. We focus on DALL-E 2 and related models as an emergent approach to image-making that operates through the cultural techniques of feature extraction and semantic compression. These techniques, we argue, are inhuman, invisual, and opaque, yet are still caught in a paradox that is ironically all too human: the consistent reproduction of whiteness as a latent feature of dominant visual culture. We use Open AI's failed efforts to 'debias' their system as a critical opening to interrogate how systems like DALL-E 2 dissolve and reconstitute politically salient human concepts like race. This example vividly illustrates the stakes of this moment of transformation, when so-called foundation models reconfigure the boundaries of visual culture and when 'doing' anti-racism means deploying quick technical fixes to mitigate personal discomfort, or more importantly, potential commercial loss. △ Less

Submitted 26 October, 2022; originally announced November 2022.

arXiv:2210.07864 [pdf, other]

doi 10.1145/3593013.3594042

Gender Animus Can Still Exist Under Favorable Disparate Impact: a Cautionary Tale from Online P2P Lending

Authors: Xudong Shen, Tianhui Tan, Tuan Q. Phan, Jussi Keppo

Abstract: This paper investigates gender discrimination and its underlying drivers on a prominent Chinese online peer-to-peer (P2P) lending platform. While existing studies on P2P lending focus on disparate treatment (DT), DT narrowly recognizes direct discrimination and overlooks indirect and proxy discrimination, providing an incomplete picture. In this work, we measure a broadened discrimination notion c… ▽ More This paper investigates gender discrimination and its underlying drivers on a prominent Chinese online peer-to-peer (P2P) lending platform. While existing studies on P2P lending focus on disparate treatment (DT), DT narrowly recognizes direct discrimination and overlooks indirect and proxy discrimination, providing an incomplete picture. In this work, we measure a broadened discrimination notion called disparate impact (DI), which encompasses any disparity in the loan's funding rate that does not commensurate with the actual return rate. We develop a two-stage predictor substitution approach to estimate DI from observational data. Our findings reveal (i) female borrowers, given identical actual return rates, are 3.97% more likely to receive funding, (ii) at least 37.1% of this DI favoring female is indirect or proxy discrimination, and (iii) DT indeed underestimates the overall female favoritism by 44.6%. However, we also identify the overall female favoritism can be explained by one specific discrimination driver, rational statistical discrimination, wherein investors accurately predict the expected return rate from imperfect observations. Furthermore, female borrowers still require 2% higher expected return rate to secure funding, indicating another driver taste-based discrimination co-exists and is against female. These results altogether tell a cautionary tale: on one hand, P2P lending provides a valuable alternative credit market where the affirmative action to support female naturally emerges from the rational crowd; on the other hand, while the overall discrimination effect (both in terms of DI or DT) favors female, concerning taste-based discrimination can persist and can be obscured by other co-existing discrimination drivers, such as statistical discrimination. △ Less

Submitted 14 May, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: published at FAccT'23

arXiv:2210.06297 [pdf, other]

Multimodality Multi-Lead ECG Arrhythmia Classification using Self-Supervised Learning

Authors: Thinh Phan, Duc Le, Patel Brijesh, Donald Adjeroh, Jingxian Wu, Morten Olgaard Jensen, Ngan Le

Abstract: Electrocardiogram (ECG) signal is one of the most effective sources of information mainly employed for the diagnosis and prediction of cardiovascular diseases (CVDs) connected with the abnormalities in heart rhythm. Clearly, single modality ECG (i.e. time series) cannot convey its complete characteristics, thus, exploiting both time and time-frequency modalities in the form of time-series data and… ▽ More Electrocardiogram (ECG) signal is one of the most effective sources of information mainly employed for the diagnosis and prediction of cardiovascular diseases (CVDs) connected with the abnormalities in heart rhythm. Clearly, single modality ECG (i.e. time series) cannot convey its complete characteristics, thus, exploiting both time and time-frequency modalities in the form of time-series data and spectrogram is needed. Leveraging the cutting-edge self-supervised learning (SSL) technique on unlabeled data, we propose SSL-based multimodality ECG classification. Our proposed network follows SSL learning paradigm and consists of two modules corresponding to pre-stream task, and down-stream task, respectively. In the SSL-pre-stream task, we utilize self-knowledge distillation (KD) techniques with no labeled data, on various transformations and in both time and frequency domains. In the down-stream task, which is trained on labeled data, we propose a gate fusion mechanism to fuse information from multimodality.To evaluate the effectiveness of our approach, ten-fold cross validation on the 12-lead PhysioNet 2020 dataset has been conducted. △ Less

Submitted 30 September, 2022; originally announced October 2022.

arXiv:2208.10671 [pdf, other]

Cardinality-Regularized Hawkes-Granger Model

Authors: Tsuyoshi Idé, Georgios Kollias, Dzung T. Phan, Naoki Abe

Abstract: We propose a new sparse Granger-causal learning framework for temporal event data. We focus on a specific class of point processes called the Hawkes process. We begin by pointing out that most of the existing sparse causal learning algorithms for the Hawkes process suffer from a singularity in maximum likelihood estimation. As a result, their sparse solutions can appear only as numerical artifacts… ▽ More We propose a new sparse Granger-causal learning framework for temporal event data. We focus on a specific class of point processes called the Hawkes process. We begin by pointing out that most of the existing sparse causal learning algorithms for the Hawkes process suffer from a singularity in maximum likelihood estimation. As a result, their sparse solutions can appear only as numerical artifacts. In this paper, we propose a mathematically well-defined sparse causal learning framework based on a cardinality-regularized Hawkes process, which remedies the pathological issues of existing approaches. We leverage the proposed algorithm for the task of instance-wise causal event analysis, where sparsity plays a critical role. We validate the proposed framework with two real use-cases, one from the power grid and the other from the cloud data center management domain. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 17 pages, 9 figures

arXiv:2208.05219 [pdf, other]

doi 10.1007/978-3-031-19759-8_16

Capturing Dependencies within Machine Learning via a Formal Process Model

Authors: Fabian Ritz, Thomy Phan, Andreas Sedlmeier, Philipp Altmann, Jan Wieghardt, Reiner Schmid, Horst Sauer, Cornel Klein, Claudia Linnhoff-Popien, Thomas Gabor

Abstract: The development of Machine Learning (ML) models is more than just a special case of software development (SD): ML models acquire properties and fulfill requirements even without direct human interaction in a seemingly uncontrollable manner. Nonetheless, the underlying processes can be described in a formal way. We define a comprehensive SD process model for ML that encompasses most tasks and artif… ▽ More The development of Machine Learning (ML) models is more than just a special case of software development (SD): ML models acquire properties and fulfill requirements even without direct human interaction in a seemingly uncontrollable manner. Nonetheless, the underlying processes can be described in a formal way. We define a comprehensive SD process model for ML that encompasses most tasks and artifacts described in the literature in a consistent way. In addition to the production of the necessary artifacts, we also focus on generating and validating fitting descriptions in the form of specifications. We stress the importance of further evolving the ML model throughout its life-cycle even after initial training and testing. Thus, we provide various interaction points with standard SD processes in which ML often is an encapsulated task. Further, our SD process model allows to formulate ML as a (meta-) optimization problem. If automated rigorously, it can be used to realize self-adaptive autonomous systems. Finally, our SD process model features a description of time that allows to reason about the progress within ML development processes. This might lead to further applications of formal methods within the field of ML. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: 10 pages, 5 figures, draft; the final version will appear in the proceedings of the International Symposium on Leveraging Applications of Formal Methods (ISoLA) 2022

Journal ref: ISoLA 2022: Leveraging Applications of Formal Methods, Verification and Validation. Adaptation and Learning. pp 249-265

arXiv:2207.08098 [pdf, other]

Model-Agnostic and Diverse Explanations for Streaming Rumour Graphs

Authors: Thanh Tam Nguyen, Thanh Cong Phan, Minh Hieu Nguyen, Matthias Weidlich, Hongzhi Yin, Jun Jo, Quoc Viet Hung Nguyen

Abstract: The propagation of rumours on social media poses an important threat to societies, so that various techniques for rumour detection have been proposed recently. Yet, existing work focuses on \emph{what} entities constitute a rumour, but provides little support to understand \emph{why} the entities have been classified as such. This prevents an effective evaluation of the detected rumours as well as… ▽ More The propagation of rumours on social media poses an important threat to societies, so that various techniques for rumour detection have been proposed recently. Yet, existing work focuses on \emph{what} entities constitute a rumour, but provides little support to understand \emph{why} the entities have been classified as such. This prevents an effective evaluation of the detected rumours as well as the design of countermeasures. In this work, we argue that explanations for detected rumours may be given in terms of examples of related rumours detected in the past. A diverse set of similar rumours helps users to generalize, i.e., to understand the properties that govern the detection of rumours. Since the spread of rumours in social media is commonly modelled using feature-annotated graphs, we propose a query-by-example approach that, given a rumour graph, extracts the $k$ most similar and diverse subgraphs from past rumours. The challenge is that all of the computations require fast assessment of similarities between graphs. To achieve an efficient and adaptive realization of the approach in a streaming setting, we present a novel graph representation learning technique and report on implementation considerations. Our evaluation experiments show that our approach outperforms baseline techniques in delivering meaningful explanations for various rumour propagation behaviours. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2206.05600 [pdf, other]

Narratives: the Unforeseen Influencer of Privacy Concerns

Authors: Ze Shi Li, Manish Sihag, Nowshin Nawar Arony, Joao Bezerra Junior, Thanh Phan, Neil Ernst, Daniela Damian

Abstract: Privacy requirements are increasingly growing in importance as new privacy regulations are enacted. To adequately manage privacy requirements, organizations not only need to comply with privacy regulations, but also consider user privacy concerns. In this exploratory study, we used Reddit as a source to understand users' privacy concerns regarding software applications. We collected 4.5 million po… ▽ More Privacy requirements are increasingly growing in importance as new privacy regulations are enacted. To adequately manage privacy requirements, organizations not only need to comply with privacy regulations, but also consider user privacy concerns. In this exploratory study, we used Reddit as a source to understand users' privacy concerns regarding software applications. We collected 4.5 million posts from Reddit and classified 129075 privacy related posts, which is a non-negligible number of privacy discussions. Next, we clustered these posts and identified 9 main areas of privacy concerns. We use the concept of narratives from economics (i.e., posts that can go viral) to explain the phenomenon of what and when users change in their discussion of privacy. We further found that privacy discussions change over time and privacy regulatory events have a short term impact on such discussions. However, narratives have a notable impact on what and when users discussed about privacy. Considering narratives could guide software organizations in eliciting the relevant privacy concerns before developing them as privacy requirements. △ Less

Submitted 11 June, 2022; originally announced June 2022.

Comments: 13 pages, to be published in 30th IEEE International Requirements Engineering Conference (RE'22)

arXiv:2206.04864 [pdf, other]

Binarizing Split Learning for Data Privacy Enhancement and Computation Reduction

Authors: Ngoc Duy Pham, Alsharif Abuadbba, Yansong Gao, Tran Khoa Phan, Naveen Chilamkurti

Abstract: Split learning (SL) enables data privacy preservation by allowing clients to collaboratively train a deep learning model with the server without sharing raw data. However, SL still has limitations such as potential data privacy leakage and high computation at clients. In this study, we propose to binarize the SL local layers for faster computation (up to 17.5 times less forward-propagation time in… ▽ More Split learning (SL) enables data privacy preservation by allowing clients to collaboratively train a deep learning model with the server without sharing raw data. However, SL still has limitations such as potential data privacy leakage and high computation at clients. In this study, we propose to binarize the SL local layers for faster computation (up to 17.5 times less forward-propagation time in both training and inference phases on mobile devices) and reduced memory usage (up to 32 times less memory and bandwidth requirements). More importantly, the binarized SL (B-SL) model can reduce privacy leakage from SL smashed data with merely a small degradation in model accuracy. To further enhance the privacy preservation, we also propose two novel approaches: 1) training with additional local leak loss and 2) applying differential privacy, which could be integrated separately or concurrently into the B-SL model. Experimental results with different datasets have affirmed the advantages of the B-SL models compared with several benchmark models. The effectiveness of B-SL models against feature-space hijacking attack (FSHA) is also illustrated. Our results have demonstrated B-SL models are promising for lightweight IoT/mobile applications with high privacy-preservation requirements such as mobile healthcare applications. △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.04328 [pdf, other]

Novel projection schemes for graph-based Light Field coding

Authors: Bach Gia Nguyen, Chanh Minh Tran, Tho Nguyen Duc, Tan Xuan Phan, Kamioka Eiji

Abstract: In Light Field compression, graph-based coding is powerful to exploit signal redundancy along irregular shapes and obtains good energy compaction. However, apart from high time complexity to process high dimensional graphs, their graph construction method is highly sensitive to the accuracy of disparity information between viewpoints. In real world Light Field or synthetic Light Field generated by… ▽ More In Light Field compression, graph-based coding is powerful to exploit signal redundancy along irregular shapes and obtains good energy compaction. However, apart from high time complexity to process high dimensional graphs, their graph construction method is highly sensitive to the accuracy of disparity information between viewpoints. In real world Light Field or synthetic Light Field generated by computer software, the use of disparity information for super-rays projection might suffer from inaccuracy due to vignetting effect and large disparity between views in the two types of Light Fields respectively. This paper introduces two novel projection schemes resulting in less error in disparity information, in which one projection scheme can also significantly reduce time computation for both encoder and decoder. Experimental results show projection quality of super-pixels across views can be considerably enhanced using the proposals, along with rate-distortion performance when compared against original projection scheme and HEVC-based or JPEG Pleno-based coding approaches. △ Less

Submitted 9 June, 2022; originally announced June 2022.

arXiv:2205.11087 [pdf, ps, other]

MetaSlicing: A Novel Resource Allocation Framework for Metaverse

Authors: Nam H. Chu, Dinh Thai Hoang, Diep N. Nguyen, Khoa T. Phan, Eryk Dutkiewicz, Dusit Niyato, Tao Shu

Abstract: Creating and maintaining the Metaverse requires enormous resources that have never been seen before, especially computing resources for intensive data processing to support the Extended Reality, enormous storage resources, and massive networking resources for maintaining ultra high-speed and low-latency connections. Therefore, this work aims to propose a novel framework, namely MetaSlicing, that c… ▽ More Creating and maintaining the Metaverse requires enormous resources that have never been seen before, especially computing resources for intensive data processing to support the Extended Reality, enormous storage resources, and massive networking resources for maintaining ultra high-speed and low-latency connections. Therefore, this work aims to propose a novel framework, namely MetaSlicing, that can provide a highly effective and comprehensive solution in managing and allocating different types of resources for Metaverse applications. In particular, by observing that Metaverse applications may have common functions, we first propose grouping applications into clusters, called MetaInstances. In a MetaInstance, common functions can be shared among applications. As such, the same resources can be used by multiple applications simultaneously, thereby enhancing resource utilization dramatically.To address the real-time characteristic and resource demand's dynamic and uncertainty in the Metaverse, we develop an effective framework based on the semi-Markov decision process and propose an intelligent admission control algorithm that can maximize resource utilization and enhance the Quality-of-Service for end-users. Extensive simulation results show that our proposed solution outperforms the Greedy-based policies by up to 80% and 47% in terms of long-term revenue for Metaverse providers and request acceptance probability, respectively. △ Less

Submitted 26 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: Revised figures, fix typos

arXiv:2203.01635 [pdf, ps, other]

Parallel feature selection based on the trace ratio criterion

Authors: Thu Nguyen, Thanh Nhan Phan, Van Nhuong Nguyen, Thanh Binh Nguyen, Pål Halvorsen, Michael Riegler

Abstract: The growth of data today poses a challenge in management and inference. While feature extraction methods are capable of reducing the size of the data for inference, they do not help in minimizing the cost of data storage. On the other hand, feature selection helps to remove the redundant features and therefore is helpful not only in inference but also in reducing management costs. This work presen… ▽ More The growth of data today poses a challenge in management and inference. While feature extraction methods are capable of reducing the size of the data for inference, they do not help in minimizing the cost of data storage. On the other hand, feature selection helps to remove the redundant features and therefore is helpful not only in inference but also in reducing management costs. This work presents a novel parallel feature selection approach for classification, namely Parallel Feature Selection using Trace criterion (PFST), which scales up to very large datasets. Our method uses trace criterion, a measure of class separability used in Fisher's Discriminant Analysis, to evaluate feature usefulness. We analyzed the criterion's desirable properties theoretically. Based on the criterion, PFST rapidly finds important features out of a set of features for big datasets by first making a forward selection with early removal of seemingly redundant features parallelly. After the most important features are included in the model, we check back their contribution for possible interaction that may improve the fit. Lastly, we make a backward selection to check back possible redundant added by the forward steps. We evaluate our methods via various experiments using Linear Discriminant Analysis as the classifier on selected features. The experiments show that our method can produce a small set of features in a fraction of the amount of time by the other methods under comparison. In addition, the classifier trained on the features selected by PFST not only achieves better accuracy than the ones chosen by other approaches but can also achieve better accuracy than the classification on all available features. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2202.11508 [pdf, ps, other]

AI-enabled mm-Waveform Configuration for Autonomous Vehicles with Integrated Communication and Sensing

Authors: Nam H. Chu, Diep N. Nguyen, Dinh Thai Hoang, Quoc-Viet Pham, Khoa T. Phan, Won-Joo Hwang, Eryk Dutkiewicz

Abstract: Integrated Communications and Sensing (ICS) has recently emerged as an enabling technology for ubiquitous sensing and IoT applications. For ICS application to Autonomous Vehicles (AVs), optimizing the waveform structure is one of the most challenging tasks due to strong influences between sensing and data communication functions. Specifically, the preamble of a data communication frame is typicall… ▽ More Integrated Communications and Sensing (ICS) has recently emerged as an enabling technology for ubiquitous sensing and IoT applications. For ICS application to Autonomous Vehicles (AVs), optimizing the waveform structure is one of the most challenging tasks due to strong influences between sensing and data communication functions. Specifically, the preamble of a data communication frame is typically leveraged for the sensing function. As such, the higher number of preambles in a Coherent Processing Interval (CPI) is, the greater sensing task's performance is. In contrast, communication efficiency is inversely proportional to the number of preambles. Moreover, surrounding radio environments are usually dynamic with high uncertainties due to their high mobility, making the ICS's waveform optimization problem even more challenging. To that end, this paper develops a novel ICS framework established on the Markov decision process and recent advanced techniques in deep reinforcement learning. By doing so, without requiring complete knowledge of the surrounding environment in advance, the ICS-AV can adaptively optimize its waveform structure (i.e., number of frames in the CPI) to maximize sensing and data communication performance under the surrounding environment's dynamic and uncertainty. Extensive simulations show that our proposed approach can improve the joint communication and sensing performance up to 46.26% compared with other baseline methods. △ Less

Submitted 31 October, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: Typos, channel model updates

arXiv:2202.05525 [pdf, other]

From Unsupervised to Few-shot Graph Anomaly Detection: A Multi-scale Contrastive Learning Approach

Authors: Yu Zheng, Ming Jin, Yixin Liu, Lianhua Chi, Khoa T. Phan, Shirui Pan, Yi-Ping Phoebe Chen

Abstract: Anomaly detection from graph data is an important data mining task in many applications such as social networks, finance, and e-commerce. Existing efforts in graph anomaly detection typically only consider the information in a single scale (view), thus inevitably limiting their capability in capturing anomalous patterns in complex graph data. To address this limitation, we propose a novel framewor… ▽ More Anomaly detection from graph data is an important data mining task in many applications such as social networks, finance, and e-commerce. Existing efforts in graph anomaly detection typically only consider the information in a single scale (view), thus inevitably limiting their capability in capturing anomalous patterns in complex graph data. To address this limitation, we propose a novel framework, graph ANomaly dEtection framework with Multi-scale cONtrastive lEarning (ANEMONE in short). By using a graph neural network as a backbone to encode the information from multiple graph scales (views), we learn better representation for nodes in a graph. In maximizing the agreements between instances at both the patch and context levels concurrently, we estimate the anomaly score of each node with a statistical anomaly estimator according to the degree of agreement from multiple perspectives. To further exploit a handful of ground-truth anomalies (few-shot anomalies) that may be collected in real-life applications, we further propose an extended algorithm, ANEMONE-FS, to integrate valuable information in our method. We conduct extensive experiments under purely unsupervised settings and few-shot anomaly detection settings, and we demonstrate that the proposed method ANEMONE and its variant ANEMONE-FS consistently outperform state-of-the-art algorithms on six benchmark datasets. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: 13 pages, 5 figures, 5 tables

Showing 1–50 of 118 results for author: Phan, T