subscribe to arXiv mailings

Large Language Models for Judicial Entity Extraction: A Comparative Study

Authors: Atin Sakkeer Hussain, Anu Thomas

Abstract: Domain-specific Entity Recognition holds significant importance in legal contexts, serving as a fundamental task that supports various applications such as question-answering systems, text summarization, machine translation, sentiment analysis, and information retrieval specifically within case law documents. Recent advancements have highlighted the efficacy of Large Language Models in natural lan… ▽ More Domain-specific Entity Recognition holds significant importance in legal contexts, serving as a fundamental task that supports various applications such as question-answering systems, text summarization, machine translation, sentiment analysis, and information retrieval specifically within case law documents. Recent advancements have highlighted the efficacy of Large Language Models in natural language processing tasks, demonstrating their capability to accurately detect and classify domain-specific facts (entities) from specialized texts like clinical and financial documents. This research investigates the application of Large Language Models in identifying domain-specific entities (e.g., courts, petitioner, judge, lawyer, respondents, FIR nos.) within case law documents, with a specific focus on their aptitude for handling domain-specific language complexity and contextual variations. The study evaluates the performance of state-of-the-art Large Language Model architectures, including Large Language Model Meta AI 3, Mistral, and Gemma, in the context of extracting judicial facts tailored to Indian judicial texts. Mistral and Gemma emerged as the top-performing models, showcasing balanced precision and recall crucial for accurate entity identification. These findings confirm the value of Large Language Models in judicial documents and demonstrate how they can facilitate and quicken scientific research by producing precise, organised data outputs that are appropriate for in-depth examination. △ Less

Submitted 8 July, 2024; originally announced July 2024.

ACM Class: I.2.1

arXiv:2407.05202 [pdf, other]

Harnessing the Power of LLMs: Automating Unit Test Generation for High-Performance Computing

Authors: Rabimba Karanjai, Aftab Hussain, Md Rafiqul Islam Rabin, Lei Xu, Weidong Shi, Mohammad Amin Alipour

Abstract: Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective.… ▽ More Unit testing is crucial in software engineering for ensuring quality. However, it's not widely used in parallel and high-performance computing software, particularly scientific applications, due to their smaller, diverse user base and complex logic. These factors make unit testing challenging and expensive, as it requires specialized knowledge and existing automated tools are often ineffective. To address this, we propose an automated method for generating unit tests for such software, considering their unique features like complex logic and parallel processing. Recently, large language models (LLMs) have shown promise in coding and testing. We explored the capabilities of Davinci (text-davinci-002) and ChatGPT (gpt-3.5-turbo) in creating unit tests for C++ parallel programs. Our results show that LLMs can generate mostly correct and comprehensive unit tests, although they have some limitations, such as repetitive assertions and blank test cases. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2405.18368 [pdf, other]

The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI

Authors: Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D'Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole , et al. (60 additional authors not shown)

Abstract: Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key r… ▽ More Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key role in treatment planning and post-treatment longitudinal assessment. The 2024 Brain Tumor Segmentation (BraTS) challenge on post-treatment glioma MRI will provide a community standard and benchmark for state-of-the-art automated segmentation models based on the largest expert-annotated post-treatment glioma MRI dataset. Challenge competitors will develop automated segmentation models to predict four distinct tumor sub-regions consisting of enhancing tissue (ET), surrounding non-enhancing T2/fluid-attenuated inversion recovery (FLAIR) hyperintensity (SNFH), non-enhancing tumor core (NETC), and resection cavity (RC). Models will be evaluated on separate validation and test datasets using standardized performance metrics utilized across the BraTS 2024 cluster of challenges, including lesion-wise Dice Similarity Coefficient and Hausdorff Distance. Models developed during this challenge will advance the field of automated MRI segmentation and contribute to their integration into clinical practice, ultimately enhancing patient care. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures, 1 table

arXiv:2405.11466 [pdf, other]

doi 10.1145/3664646.3664764

Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: Large language models (LLMs) have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential… ▽ More Large language models (LLMs) have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential backdoor signals in code models. Specifically, we examine attention weights and biases, and context embeddings of the clean and poisoned CodeBERT and CodeT5 models. Our results suggest noticeable patterns in context embeddings of poisoned samples for both the poisoned models; however, attention weights and biases do not show any significant differences. This work contributes to ongoing efforts in white-box detection of backdoor signals in LLMs of code through the analysis of parameters and embeddings. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: This work has been accepted at the 1st ACM International Conference on AI-powered Software (AIware), co-located with the ACM International Conference on the Foundations of Software Engineering (FSE) 2024, Porto de Galinhas, Brazil. arXiv admin note: substantial text overlap with arXiv:2402.12936

arXiv:2405.09787 [pdf, other]

Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning automated segmentation models using image data from the largest multi-institutional systematically expert annotated multilabel multi-sequence meningioma MRI dataset to date, which included 1000 training set cases, 141 validation set cases, and 283 hidden test set cases. Each case included T2, T2/FLAIR, T1, and T1Gd brain MRI sequences with associated tumor compartment labels delineating enhancing tumor, non-enhancing tumor, and surrounding non-enhancing T2/FLAIR hyperintensity. Participant automated segmentation models were evaluated and ranked based on a scoring system evaluating lesion-wise metrics including dice similarity coefficient (DSC) and 95% Hausdorff Distance. The top ranked team had a lesion-wise median dice similarity coefficient (DSC) of 0.976, 0.976, and 0.964 for enhancing tumor, tumor core, and whole tumor, respectively and a corresponding average DSC of 0.899, 0.904, and 0.871, respectively. These results serve as state-of-the-art benchmarks for future pre-operative meningioma automated segmentation algorithms. Additionally, we found that 1286 of 1424 cases (90.3%) had at least 1 compartment voxel abutting the edge of the skull-stripped image edge, which requires further investigation into optimal pre-processing face anonymization steps. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 16 pages, 11 tables, 10 figures, MICCAI

arXiv:2405.02828 [pdf, other]

Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Premkumar Devanbu, Mohammad Amin Alipour

Abstract: Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents… ▽ More Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents an overview of the current state-of-the-art trojan attacks on large language models of code, with a focus on triggers -- the main design point of trojans -- with the aid of a novel unifying trigger taxonomy framework. We also aim to provide a uniform definition of the fundamental concepts in the area of trojans in Code LLMs. Finally, we draw implications of findings on how code models learn on trigger design. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2305.03803

arXiv:2403.15848 [pdf, other]

On the Stability of Learning in Network Games with Many Players

Authors: Aamal Hussain, Dan Leonte, Francesco Belardinelli, Georgios Piliouras

Abstract: Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players. To make progress towards addressing this chall… ▽ More Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players. To make progress towards addressing this challenge we study the Q-Learning Dynamics, a classical model for exploration and exploitation in multi-agent learning. In particular, we study the behaviour of Q-Learning on games where interactions between agents are constrained by a network. We determine a number of sufficient conditions, depending on the game and network structure, which guarantee that agent strategies converge to a unique stable strategy, called the Quantal Response Equilibrium (QRE). Crucially, these sufficient conditions are independent of the total number of agents, allowing for provable convergence in arbitrarily large games. Next, we compare the learned QRE to the underlying NE of the game, by showing that any QRE is an $ε$-approximate Nash Equilibrium. We first provide tight bounds on $ε$ and show how these bounds lead naturally to a centralised scheme for choosing exploration rates, which enables independent learners to learn stable approximate Nash Equilibrium strategies. We validate the method through experiments and demonstrate its effectiveness even in the presence of numerous agents and actions. Through these results, we show that independent learning dynamics may converge to approximate Nash Equilibria, even in the presence of many agents. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: AAMAS 2024. arXiv admin note: text overlap with arXiv:2307.13922

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2402.16896 [pdf, other]

On Trojan Signatures in Large Language Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, an… ▽ More Trojan signatures, as described by Fields et al. (2021), are noticeable differences in the distribution of the trojaned class parameters (weights) and the non-trojaned class parameters of the trojaned model, that can be used to detect the trojaned model. Fields et al. (2021) found trojan signatures in computer vision classification tasks with image models, such as, Resnet, WideResnet, Densenet, and VGG. In this paper, we investigate such signatures in the classifier layer parameters of large language models of source code. Our results suggest that trojan signatures could not generalize to LLMs of code. We found that trojaned code models are stubborn, even when the models were poisoned under more explicit settings (finetuned with pre-trained weights frozen). We analyzed nine trojaned models for two binary classification tasks: clone and defect detection. To the best of our knowledge, this is the first work to examine weight-based trojan signature revelation techniques for large-language models of code and furthermore to demonstrate that detecting trojans only from the weights in such models is a hard problem. △ Less

Submitted 7 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Comments: This work has been accepted at the International Conference on Learning Representations 2024 Workshop on Secure and Trustworthy Large Language Models, SeT LLM @ ICLR 2024 (Vienna, Austria)

arXiv:2402.16757 [pdf, other]

Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids

Authors: Jasper Kirton-Wingate, Shafique Ahmed, Adeel Hussain, Mandar Gogate, Kia Dashtipour, Jen-Cheng Hou, Tassadaq Hussain, Yu Tsao, Amir Hussain

Abstract: Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-da… ▽ More Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have performed well under a variety of noise conditions. However, such systems may still introduce sonic artefacts, sound unnatural, and restrict the ability for a user to hear ambient sound which may be of importance. Hearing Aid (HA) users may wish to customise their SE systems to suit their personal preferences and day-to-day lifestyle. In this paper, we introduce a preference learning based SE (PLSE) model for future multi-modal HAs that can contextually exploit audio information to improve listening comfort, based upon the preferences of the user. The proposed system estimates the Signal-to-noise ratio (SNR) as a basic objective speech quality measure which quantifies the relative amount of background noise present in speech, and directly correlates to the intelligibility of the signal. Additionally, to provide contextual information we predict the acoustic scene in which the user is situated. These tasks are achieved via a multi-task DL model, which surpasses the performance of inferring the acoustic scene or SNR separately, by jointly leveraging a shared encoded feature space. These environmental inferences are exploited in a preference elicitation framework, which linearly learns a set of predictive functions to determine the target SNR of an AV (Audio-Visual) SE system. By greatly reducing noise in challenging listening conditions, and by novelly scaling the output of the SE model, we are able to provide HA users with contextually individualised SE. Preliminary results suggest an improvement over the non-individualised baseline model in some participants. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: This has been submitted to the Trends in Hearing journal

arXiv:2402.16394 [pdf, other]

Audio-Visual Speech Enhancement in Noisy Environments via Emotion-Based Contextual Cues

Authors: Tassadaq Hussain, Kia Dashtipour, Yu Tsao, Amir Hussain

Abstract: In real-world environments, background noise significantly degrades the intelligibility and clarity of human speech. Audio-visual speech enhancement (AVSE) attempts to restore speech quality, but existing methods often fall short, particularly in dynamic noise conditions. This study investigates the inclusion of emotion as a novel contextual cue within AVSE, hypothesizing that incorporating emotio… ▽ More In real-world environments, background noise significantly degrades the intelligibility and clarity of human speech. Audio-visual speech enhancement (AVSE) attempts to restore speech quality, but existing methods often fall short, particularly in dynamic noise conditions. This study investigates the inclusion of emotion as a novel contextual cue within AVSE, hypothesizing that incorporating emotional understanding can improve speech enhancement performance. We propose a novel emotion-aware AVSE system that leverages both auditory and visual information. It extracts emotional features from the facial landmarks of the speaker and fuses them with corresponding audio and visual modalities. This enriched data serves as input to a deep UNet-based encoder-decoder network, specifically designed to orchestrate the fusion of multimodal information enhanced with emotion. The network iteratively refines the enhanced speech representation through an encoder-decoder architecture, guided by perceptually-inspired loss functions for joint learning and optimization. We train and evaluate the model on the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) dataset, a rich repository of audio-visual recordings with annotated emotions. Our comprehensive evaluation demonstrates the effectiveness of emotion as a contextual cue for AVSE. By integrating emotional features, the proposed system achieves significant improvements in both objective and subjective assessments of speech quality and intelligibility, especially in challenging noise environments. Compared to baseline AVSE and audio-only speech enhancement systems, our approach exhibits a noticeable increase in PESQ and STOI, indicating higher perceptual quality and intelligibility. Large-scale listening tests corroborate these findings, suggesting improved human understanding of enhanced speech. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.12936 [pdf, other]

Measuring Impacts of Poisoning on Model Parameters and Neuron Activations: A Case Study of Poisoning CodeBERT

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Navid Ayoobi, Mohammad Amin Alipour

Abstract: Large language models (LLMs) have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential… ▽ More Large language models (LLMs) have revolutionized software development practices, yet concerns about their safety have arisen, particularly regarding hidden backdoors, aka trojans. Backdoor attacks involve the insertion of triggers into training data, allowing attackers to manipulate the behavior of the model maliciously. In this paper, we focus on analyzing the model parameters to detect potential backdoor signals in code models. Specifically, we examine attention weights and biases, activation values, and context embeddings of the clean and poisoned CodeBERT models. Our results suggest noticeable patterns in activation values and context embeddings of poisoned samples for the poisoned CodeBERT model; however, attention weights and biases do not show any significant differences. This work contributes to ongoing efforts in white-box detection of backdoor signals in LLMs of code through the analysis of parameters and activations. △ Less

Submitted 5 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.07261 [pdf, other]

doi 10.1109/GLOBECOM54140.2023.10437604

A Novel Technique to Parameterize Congestion Control in 6TiSCH IIoT Networks

Authors: Kushal Chakraborty, Aritra Kumar Dutta, Mohammad Avesh Hussain, Syed Raafay Mohiuddin, Nikumani Choudhury, Rakesh Matam, Mithun Mukherjee

Abstract: The Industrial Internet of Things (IIoT) refers to the use of interconnected smart devices, sensors, and other technologies to create a network of intelligent systems that can monitor and manage industrial processes. 6TiSCH (IPv6 over the Time Slotted Channel Hopping mode of IEEE 802.15.4e) as an enabling technology facilitates low-power and low-latency communication between IoT devices in industr… ▽ More The Industrial Internet of Things (IIoT) refers to the use of interconnected smart devices, sensors, and other technologies to create a network of intelligent systems that can monitor and manage industrial processes. 6TiSCH (IPv6 over the Time Slotted Channel Hopping mode of IEEE 802.15.4e) as an enabling technology facilitates low-power and low-latency communication between IoT devices in industrial environments. The Routing Protocol for Low power and lossy networks (RPL), which is used as the de-facto routing protocol for 6TiSCH networks is observed to suffer from several limitations, especially during congestion in the network. Therefore, there is an immediate need for some modifications to the RPL to deal with this problem. Under traffic load which keeps on changing continuously at different instants of time, the proposed mechanism aims at finding the appropriate parent for a node that can forward the packet to the destination through the least congested path with minimal packet loss. This facilitates congestion management under dynamic traffic loads. For this, a new metric for routing using the concept of exponential weighting has been proposed, which takes the number of packets present in the queue of the node into account when choosing the parent at a particular instance of time. Additionally, the paper proposes a parent selection and swapping mechanism for congested networks. Performance evaluations are carried out in order to validate the proposed work. The results show an improvement in the performance of RPL under heavy and dynamic traffic loads. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: The paper has been submitted, accepted, and presented at the 2023 IEEE Global Communications Conference: Next-Generation Networking and Internet, with plans for publication. It was delivered during the IEEE Global Communications Conference held on December 6th, 2023, in Kuala Lumpur, Malaysia

arXiv:2402.05466 [pdf, other]

Engineering End-to-End Remote Labs using IoT-based Retrofitting

Authors: K. S. Viswanadh, Akshit Gureja, Nagesh Walchatwar, Rishabh Agrawal, Shiven Sinha, Sachin Chaudhari, Karthik Vaidhyanathan, Venkatesh Choppella, Prabhakar Bhimalapuram, Harikumar Kandath, Aftab Hussain

Abstract: Remote labs are a groundbreaking development in the education industry, providing students with access to laboratory education anytime, anywhere. However, most remote labs are costly and difficult to scale, especially in developing countries. With this as a motivation, this paper proposes a new remote labs (RLabs) solution that includes two use case experiments: Vanishing Rod and Focal Length. The… ▽ More Remote labs are a groundbreaking development in the education industry, providing students with access to laboratory education anytime, anywhere. However, most remote labs are costly and difficult to scale, especially in developing countries. With this as a motivation, this paper proposes a new remote labs (RLabs) solution that includes two use case experiments: Vanishing Rod and Focal Length. The hardware experiments are built at a low-cost by retrofitting Internet of Things (IoT) components. They are also made portable by designing miniaturised and modular setups. The software architecture designed as part of the solution seamlessly supports the scalability of the experiments, offering compatibility with a wide range of hardware devices and IoT platforms. Additionally, it can live-stream remote experiments without needing dedicated server space for the stream. The software architecture also includes an automation suite that periodically checks the status of the experiments using computer vision (CV). RLabs is qualitatively evaluated against seven non-functional attributes - affordability, portability, scalability, compatibility, maintainability, usability, and universality. Finally, user feedback was collected from a group of students, and the scores indicate a positive response to the students' learning and the platform's usability. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 30 pages, 7 tables and 20 figures. Submitted to ACM Transactions on IoT

arXiv:2312.11943 [pdf, other]

Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos

Authors: Aamal Hussain, Francesco Belardinelli

Abstract: The behaviour of multi-agent learning in competitive network games is often studied within the context of zero-sum games, in which convergence guarantees may be obtained. However, outside of this class the behaviour of learning is known to display complex behaviours and convergence cannot be always guaranteed. Nonetheless, in order to develop a complete picture of the behaviour of multi-agent lear… ▽ More The behaviour of multi-agent learning in competitive network games is often studied within the context of zero-sum games, in which convergence guarantees may be obtained. However, outside of this class the behaviour of learning is known to display complex behaviours and convergence cannot be always guaranteed. Nonetheless, in order to develop a complete picture of the behaviour of multi-agent learning in competitive settings, the zero-sum assumption must be lifted. Motivated by this we study the Q-Learning dynamics, a popular model of exploration and exploitation in multi-agent learning, in competitive network games. We determine how the degree of competition, exploration rate and network connectivity impact the convergence of Q-Learning. To study generic competitive games, we parameterise network games in terms of correlations between agent payoffs and study the average behaviour of the Q-Learning dynamics across all games drawn from a choice of this parameter. This statistical approach establishes choices of parameters for which Q-Learning dynamics converge to a stable fixed point. Differently to previous works, we find that the stability of Q-Learning is explicitly dependent only on the network connectivity rather than the total number of agents. Our experiments validate these findings and show that, under certain network structures, the total number of agents can be increased without increasing the likelihood of unstable or chaotic behaviours. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: AAAI 2024

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2312.07039 [pdf, other]

Open-Pose 3D Zero-Shot Learning: Benchmark and Challenges

Authors: Weiguang Zhao, Guanyu Yang, Rui Zhang, Chenru Jiang, Chaolong Yang, Yuyao Yan, Amir Hussain, Kaizhu Huang

Abstract: With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification wit… ▽ More With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification, and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D. △ Less

Submitted 16 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.04004 [pdf, other]

Occlusion-based Detection of Trojan-triggering Inputs in Large Language Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Mohammad Amin Alipour, Bowen Xu

Abstract: Large language models (LLMs) are becoming an integrated part of software development. These models are trained on large datasets for code, where it is hard to verify each data point. Therefore, a potential attack surface can be to inject poisonous data into the training data to make models vulnerable, aka trojaned. It can pose a significant threat by hiding manipulative behaviors inside models, le… ▽ More Large language models (LLMs) are becoming an integrated part of software development. These models are trained on large datasets for code, where it is hard to verify each data point. Therefore, a potential attack surface can be to inject poisonous data into the training data to make models vulnerable, aka trojaned. It can pose a significant threat by hiding manipulative behaviors inside models, leading to compromising the integrity of the models in downstream tasks. In this paper, we propose an occlusion-based human-in-the-loop technique, OSeql, to distinguish trojan-triggering inputs of code. The technique is based on the observation that trojaned neural models of code rely heavily on the triggering part of input; hence, its removal would change the confidence of the models in their prediction substantially. Our results suggest that OSeql can detect the triggering inputs with almost 100% recall. We discuss the problem of false positives and how to address them. These results provide a baseline for future studies in this field. △ Less

Submitted 10 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.03483 [pdf, other]

Exploring Answer Information Methods for Question Generation with Transformers

Authors: Talha Chafekar, Aafiya Hussain, Grishma Sharma, Deepak Sharma

Abstract: There has been a lot of work in question generation where different methods to provide target answers as input, have been employed. This experimentation has been mostly carried out for RNN based models. We use three different methods and their combinations for incorporating answer information and explore their effect on several automatic evaluation metrics. The methods that are used are answer pro… ▽ More There has been a lot of work in question generation where different methods to provide target answers as input, have been employed. This experimentation has been mostly carried out for RNN based models. We use three different methods and their combinations for incorporating answer information and explore their effect on several automatic evaluation metrics. The methods that are used are answer prompting, using a custom product method using answer embeddings and encoder outputs, choosing sentences from the input paragraph that have answer related information, and using a separate cross-attention attention block in the decoder which attends to the answer. We observe that answer prompting without any additional modes obtains the best scores across rouge, meteor scores. Additionally, we use a custom metric to calculate how many of the generated questions have the same answer, as the answer which is used to generate them. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.14850 [pdf, other]

TrojanedCM: A Repository of Trojaned Large Language Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Mohammad Amin Alipour

Abstract: With the rapid growth of research in trojaning deep neural models of source code, we observe that there is a need of developing a benchmark trojaned models for testing various trojan detection and unlearning techniques. In this work, we aim to provide the scientific community with diverse trojaned code models, that cover a variety of state-of-the-art architectures, on which they can examine such t… ▽ More With the rapid growth of research in trojaning deep neural models of source code, we observe that there is a need of developing a benchmark trojaned models for testing various trojan detection and unlearning techniques. In this work, we aim to provide the scientific community with diverse trojaned code models, that cover a variety of state-of-the-art architectures, on which they can examine such techniques. We thus present TrojanedCM, a publicly available repository of clean and poisoned models of source code. We provide poisoned models for two code classification tasks (defect detection and clone detection) and a code generation task (text-to-code generation). We finetuned popular pretrained code models such as CodeBERT, PLBART, CodeT5, CodeT5+, on poisoned datasets that we generated from benchmark datasets (Devign, BigCloneBench, CONCODE) for the above mentioned tasks. The repository also provides full access to the architecture and parameters of the models, allowing practitioners to investigate different white-box analysis techniques. In addition to the poisoned models, we also provide a poisoning framework using which practitioners can deploy various poisoning strategies for the different tasks and models of source code. All the material are accessible via this link: https://github.com/UH-SERG/TrojanedCM. △ Less

Submitted 11 December, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

arXiv:2311.11255 [pdf, other]

M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models

Authors: Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, Ying Shan

Abstract: The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. They also utilize LLMs to understand human intention and generate desired outputs like images, videos, and music. However, research that combines both un… ▽ More The current landscape of research leveraging large language models (LLMs) is experiencing a surge. Many works harness the powerful reasoning capabilities of these models to comprehend various modalities, such as text, speech, images, videos, etc. They also utilize LLMs to understand human intention and generate desired outputs like images, videos, and music. However, research that combines both understanding and generation using LLMs is still limited and in its nascent stage. To address this gap, we introduce a Multi-modal Music Understanding and Generation (M$^{2}$UGen) framework that integrates LLM's abilities to comprehend and generate music for different modalities. The M$^{2}$UGen framework is purpose-built to unlock creative potential from diverse sources of inspiration, encompassing music, image, and video through the use of pretrained MERT, ViT, and ViViT models, respectively. To enable music generation, we explore the use of AudioLDM 2 and MusicGen. Bridging multi-modal understanding and music generation is accomplished through the integration of the LLaMA 2 model. Furthermore, we make use of the MU-LLaMA model to generate extensive datasets that support text/image/video-to-music generation, facilitating the training of our M$^{2}$UGen framework. We conduct a thorough evaluation of our proposed framework. The experimental results demonstrate that our model achieves or surpasses the performance of the current state-of-the-art models. △ Less

Submitted 4 March, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.06564 [pdf, other]

Seeing is Believing: A Federated Learning Based Prototype to Detect Wireless Injection Attacks

Authors: Aadil Hussain, Nitheesh Gundapu, Sarang Drugkar, Suraj Kiran, J. Harshan, Ranjitha Prasad

Abstract: Reactive injection attacks are a class of security threats in wireless networks wherein adversaries opportunistically inject spoofing packets in the frequency band of a client thereby forcing the base-station to deploy impersonation-detection methods. Towards circumventing such threats, we implement secret-key based physical-layer signalling methods at the clients which allow the base-stations to… ▽ More Reactive injection attacks are a class of security threats in wireless networks wherein adversaries opportunistically inject spoofing packets in the frequency band of a client thereby forcing the base-station to deploy impersonation-detection methods. Towards circumventing such threats, we implement secret-key based physical-layer signalling methods at the clients which allow the base-stations to deploy machine learning (ML) models on their in-phase and quadrature samples at the baseband for attack detection. Using Adalm Pluto based software defined radios to implement the secret-key based signalling methods, we show that robust ML models can be designed at the base-stations. However, we also point out that, in practice, insufficient availability of training datasets at the base-stations can make these methods ineffective. Thus, we use a federated learning framework in the backhaul network, wherein a group of base-stations that need to protect their clients against reactive injection threats collaborate to refine their ML models by ensuring privacy on their datasets. Using a network of XBee devices to implement the backhaul network, experimental results on our federated learning setup shows significant enhancements in the detection accuracy, thus presenting wireless security as an excellent use-case for federated learning in 6G networks and beyond. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 6 pages with 8 figures

arXiv:2310.16406 [pdf, other]

Radio Frequency Fingerprinting via Deep Learning: Challenges and Opportunities

Authors: Saeif Al-Hazbi, Ahmed Hussain, Savio Sciancalepore, Gabriele Oligeri, Panos Papadimitratos

Abstract: Radio Frequency Fingerprinting (RFF) techniques promise to authenticate wireless devices at the physical layer based on inherent hardware imperfections introduced during manufacturing. Such RF transmitter imperfections are reflected into over-the-air signals, allowing receivers to accurately identify the RF transmitting source. Recent advances in Machine Learning, particularly in Deep Learning (DL… ▽ More Radio Frequency Fingerprinting (RFF) techniques promise to authenticate wireless devices at the physical layer based on inherent hardware imperfections introduced during manufacturing. Such RF transmitter imperfections are reflected into over-the-air signals, allowing receivers to accurately identify the RF transmitting source. Recent advances in Machine Learning, particularly in Deep Learning (DL), have improved the ability of RFF systems to extract and learn complex features that make up the device-specific fingerprint. However, integrating DL techniques with RFF and operating the system in real-world scenarios presents numerous challenges, originating from the embedded systems and the DL research domains. This paper systematically identifies and analyzes the essential considerations and challenges encountered in the creation of DL-based RFF systems across their typical development life-cycle, which include (i) data collection and preprocessing, (ii) training, and finally, (iii) deployment. Our investigation provides a comprehensive overview of the current open problems that prevent real deployment of DL-based RFF systems while also discussing promising research opportunities to enhance the overall accuracy, robustness, and privacy of these systems. △ Less

Submitted 15 April, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Authors version; Accepted for the 20th International Wireless Communications and Mobile Computing (IWCMC) Security Symposium, 2024

arXiv:2309.11059 [pdf, other]

Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement

Authors: Shafique Ahmed, Chia-Wei Chen, Wenze Ren, Chin-Jou Li, Ernie Chu, Jun-Cheng Chen, Amir Hussain, Hsin-Min Wang, Yu Tsao, Jen-Cheng Hou

Abstract: Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a com… ▽ More Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a complex U-Net-based framework. The audio and visual signals are processed using a complex encoder and a ResNet-18 model, respectively. These processed signals are then fused using the conformer blocks and transformed into enhanced speech waveforms via a complex decoder. The conformer blocks consist of a combination of self-attention mechanisms and convolutional operations, enabling DCUC-Net to effectively capture both global and local audio-visual dependencies. Our experimental results demonstrate the effectiveness of DCUC-Net, as it outperforms the baseline model from the COG-MHEAR AVSE Challenge 2023 by a notable margin of 0.14 in terms of PESQ. Additionally, the proposed DCUC-Net performs comparably to a state-of-the-art model and outperforms all other compared models on the Taiwan Mandarin speech with video (TMSV) dataset. △ Less

Submitted 8 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.04698 [pdf, other]

Advancements in Upper Body Exoskeleton: Implementing Active Gravity Compensation with a Feedforward Controller

Authors: Muhammad Ayaz Hussain, Ioannis Iossifidis

Abstract: In this study, we present a feedforward control system designed for active gravity compensation on an upper body exoskeleton. The system utilizes only positional data from internal motor sensors to calculate torque, employing analytical control equations based on Newton-Euler Inverse Dynamics. Compared to feedback control systems, the feedforward approach offers several advantages. It eliminates t… ▽ More In this study, we present a feedforward control system designed for active gravity compensation on an upper body exoskeleton. The system utilizes only positional data from internal motor sensors to calculate torque, employing analytical control equations based on Newton-Euler Inverse Dynamics. Compared to feedback control systems, the feedforward approach offers several advantages. It eliminates the need for external torque sensors, resulting in reduced hardware complexity and weight. Moreover, the feedforward control exhibits a more proactive response, leading to enhanced performance. The exoskeleton used in the experiments is lightweight and comprises 4 Degrees of Freedom, closely mimicking human upper body kinematics and three-dimensional range of motion. We conducted tests on both hardware and simulations of the exoskeleton, demonstrating stable performance. The system maintained its position over an extended period, exhibiting minimal friction and avoiding undesired slewing. △ Less

Submitted 9 September, 2023; originally announced September 2023.

ACM Class: B.m; B.1; I.6

arXiv:2308.11276 [pdf, other]

Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning

Authors: Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, Ying Shan

Abstract: Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract musi… ▽ More Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract music features. However, obtaining a suitable dataset for training the MU-LLaMA model remains challenging, as existing publicly accessible audio question answering datasets lack the necessary depth for open-ended music question answering. To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions. The experiments demonstrate that the proposed MU-LLaMA model, trained on our designed MusicQA dataset, achieves outstanding performance in both music question answering and music caption generation across various metrics, outperforming current state-of-the-art (SOTA) models in both fields and offering a promising advancement in the T2M-Gen research field. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.06272 [pdf, other]

Beyond Reality: The Pivotal Role of Generative AI in the Metaverse

Authors: Vinay Chamola, Gaurang Bansal, Tridib Kumar Das, Vikas Hassija, Naga Siva Sai Reddy, Jiacheng Wang, Sherali Zeadally, Amir Hussain, F. Richard Yu, Mohsen Guizani, Dusit Niyato

Abstract: Imagine stepping into a virtual world that's as rich, dynamic, and interactive as our physical one. This is the promise of the Metaverse, and it's being brought to life by the transformative power of Generative Artificial Intelligence (AI). This paper offers a comprehensive exploration of how generative AI technologies are shaping the Metaverse, transforming it into a dynamic, immersive, and inter… ▽ More Imagine stepping into a virtual world that's as rich, dynamic, and interactive as our physical one. This is the promise of the Metaverse, and it's being brought to life by the transformative power of Generative Artificial Intelligence (AI). This paper offers a comprehensive exploration of how generative AI technologies are shaping the Metaverse, transforming it into a dynamic, immersive, and interactive virtual world. We delve into the applications of text generation models like ChatGPT and GPT-3, which are enhancing conversational interfaces with AI-generated characters. We explore the role of image generation models such as DALL-E and MidJourney in creating visually stunning and diverse content. We also examine the potential of 3D model generation technologies like Point-E and Lumirithmic in creating realistic virtual objects that enrich the Metaverse experience. But the journey doesn't stop there. We also address the challenges and ethical considerations of implementing these technologies in the Metaverse, offering insights into the balance between user control and AI automation. This paper is not just a study, but a guide to the future of the Metaverse, offering readers a roadmap to harnessing the power of generative AI in creating immersive virtual worlds. △ Less

Submitted 28 July, 2023; originally announced August 2023.

Comments: 8 pages, 4 figures

arXiv:2307.13928 [pdf, other]

Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics

Authors: Aamal Hussain, Francesco Belardinelli, Georgios Piliouras

Abstract: The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum… ▽ More The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Presented at IJCAI 2023

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2307.13922 [pdf, other]

Stability of Multi-Agent Learning: Convergence in Network Games with Many Players

Authors: Aamal Hussain, Dan Leonte, Francesco Belardinelli, Georgios Piliouras

Abstract: The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the… ▽ More The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Presented at the Workshop on New Frontiers in Learning, Control, and Dynamical Systems at the International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA, 2023

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2307.07062 [pdf, other]

Controllable Emphasis with zero data for text-to-speech

Authors: Arnaud Joly, Marco Nicolis, Ekaterina Peterova, Alessandro Lombardi, Ammar Abbas, Arent van Korlaar, Aman Hussain, Parul Sharma, Alexis Moinet, Mateusz Lajszczak, Penny Karanasou, Antonio Bonafonte, Thomas Drugman, Elena Sokolova

Abstract: We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques im… ▽ More We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word. We show that this is significantly better than spectrogram modification techniques improving naturalness by $7.3\%$ and correct testers' identification of the emphasized word in a sentence by $40\%$ on a reference female en-US voice. We show that this technique significantly closes the gap to methods that require explicit recordings. The method proved to be scalable and preferred in all four languages tested (English, Spanish, Italian, German), for different voices and multiple speaking styles. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: In proceeding of 12th Speech Synthesis Workshop (SSW) 2023

arXiv:2307.01221 [pdf, other]

Filter Bubbles in Recommender Systems: Fact or Fallacy -- A Systematic Review

Authors: Qazi Mohammad Areeb, Mohammad Nadeem, Shahab Saquib Sohail, Raza Imam, Faiyaz Doctor, Yassine Himeur, Amir Hussain, Abbes Amira

Abstract: A filter bubble refers to the phenomenon where Internet customization effectively isolates individuals from diverse opinions or materials, resulting in their exposure to only a select set of content. This can lead to the reinforcement of existing attitudes, beliefs, or conditions. In this study, our primary focus is to investigate the impact of filter bubbles in recommender systems. This pioneerin… ▽ More A filter bubble refers to the phenomenon where Internet customization effectively isolates individuals from diverse opinions or materials, resulting in their exposure to only a select set of content. This can lead to the reinforcement of existing attitudes, beliefs, or conditions. In this study, our primary focus is to investigate the impact of filter bubbles in recommender systems. This pioneering research aims to uncover the reasons behind this problem, explore potential solutions, and propose an integrated tool to help users avoid filter bubbles in recommender systems. To achieve this objective, we conduct a systematic literature review on the topic of filter bubbles in recommender systems. The reviewed articles are carefully analyzed and classified, providing valuable insights that inform the development of an integrated approach. Notably, our review reveals evidence of filter bubbles in recommendation systems, highlighting several biases that contribute to their existence. Moreover, we propose mechanisms to mitigate the impact of filter bubbles and demonstrate that incorporating diversity into recommendations can potentially help alleviate this issue. The findings of this timely review will serve as a benchmark for researchers working in interdisciplinary fields such as privacy, artificial intelligence ethics, and recommendation systems. Furthermore, it will open new avenues for future research in related domains, prompting further exploration and advancement in this critical area. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: 21 pages, 10 figures and 5 tables

arXiv:2305.13765 [pdf]

doi 10.1145/3533384

Human Body Pose Estimation for Gait Identification: A Comprehensive Survey of Datasets and Models

Authors: Luke K. Topham, Wasiq Khan, Dhiya Al-Jumeily, Abir Hussain

Abstract: Person identification is a problem that has received substantial attention, particularly in security domains. Gait recognition is one of the most convenient approaches enabling person identification at a distance without the need of high-quality images. There are several review studies addressing person identification such as the utilization of facial images, silhouette images, and wearable sensor… ▽ More Person identification is a problem that has received substantial attention, particularly in security domains. Gait recognition is one of the most convenient approaches enabling person identification at a distance without the need of high-quality images. There are several review studies addressing person identification such as the utilization of facial images, silhouette images, and wearable sensor. Despite skeleton-based person identification gaining popularity while overcoming the challenges of traditional approaches, existing survey studies lack the comprehensive review of skeleton-based approaches to gait identification. We present a detailed review of the human pose estimation and gait analysis that make the skeleton-based approaches possible. The study covers various types of related datasets, tools, methodologies, and evaluation metrics with associated challenges, limitations, and application domains. Detailed comparisons are presented for each of these aspects with recommendations for potential research and alternatives. A common trend throughout this paper is the positive impact that deep learning techniques are beginning to have on topics such as human pose estimation and gait identification. The survey outcomes might be useful for the related research community and other stakeholders in terms of performance analysis of existing methodologies, potential research gaps, application domains, and possible contributions in the future. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.10025 [pdf]

Resolving the Decreased Rank Attack in RPL's IoT Networks

Authors: B. Ghaleb, A. Al-Dubai, A. Hussain, J. Ahmad, I. Romdhani, Z. Jaroucheh

Abstract: The Routing Protocol for Low power and Lossy networks (RPL) has been developed by the Internet Engineering Task Force (IETF) standardization body to serve as a part of the 6LoWPAN (IPv6 over Low-Power Wireless Personal Area Networks) standard, a core communication technology for the Internet of Things (IoT) networks. RPL organizes its network in the form of a tree-like structure where a node is co… ▽ More The Routing Protocol for Low power and Lossy networks (RPL) has been developed by the Internet Engineering Task Force (IETF) standardization body to serve as a part of the 6LoWPAN (IPv6 over Low-Power Wireless Personal Area Networks) standard, a core communication technology for the Internet of Things (IoT) networks. RPL organizes its network in the form of a tree-like structure where a node is configured as the root of the tree while others integrate themselves into that structure based on their relative distance. A value called the Rank is used to define each node's relative position and it is used by other nodes to take their routing decisions. A malicious node can illegitimately claim a closer position to the root by advertising a lower rank value trapping other nodes to forward their traffic through that malicious node. In this study, we show how this behavior can have a detrimental side effect on the network via extensive simulations and propose a new secure objective function to prevent such an attack. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.03803 [pdf, other]

A Survey of Trojans in Neural Models of Source Code: Taxonomy and Techniques

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Navid Ayoobi, Bowen Xu, Prem Devanbu, Mohammad Amin Alipour

Abstract: In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pi… ▽ More In this work, we study literature in Explainable AI and Safe AI to understand poisoning of neural models of code. In order to do so, we first establish a novel taxonomy for Trojan AI for code, and present a new aspect-based classification of triggers in neural models of code. Next, we highlight recent works that help us deepen our conception of how these models understand software code. Then we pick some of the recent, state-of-art poisoning strategies that can be used to manipulate such models. The insights we draw can potentially help to foster future research in the area of Trojan AI for code. △ Less

Submitted 18 April, 2024; v1 submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.12729 [pdf, other]

doi 10.1109/IJCNN54540.2023.10191163

Morphological Classification of Extragalactic Radio Sources Using Gradient Boosting Methods

Authors: Abdollah Masoud Darya, Ilias Fernini, Marley Vellasco, Abir Hussain

Abstract: The field of radio astronomy is witnessing a boom in the amount of data produced per day due to newly commissioned radio telescopes. One of the most crucial problems in this field is the automatic classification of extragalactic radio sources based on their morphologies. Most recent contributions in the field of morphological classification of extragalactic radio sources have proposed classifiers… ▽ More The field of radio astronomy is witnessing a boom in the amount of data produced per day due to newly commissioned radio telescopes. One of the most crucial problems in this field is the automatic classification of extragalactic radio sources based on their morphologies. Most recent contributions in the field of morphological classification of extragalactic radio sources have proposed classifiers based on convolutional neural networks. Alternatively, this work proposes gradient boosting machine learning methods accompanied by principal component analysis as data-efficient alternatives to convolutional neural networks. Recent findings have shown the efficacy of gradient boosting methods in outperforming deep learning methods for classification problems with tabular data. The gradient boosting methods considered in this work are based on the XGBoost, LightGBM, and CatBoost implementations. This work also studies the effect of dataset size on classifier performance. A three-class classification problem is considered in this work based on the three main Fanaroff-Riley classes: class 0, class I, and class II, using radio sources from the Best-Heckman sample. All three proposed gradient boosting methods outperformed a state-of-the-art convolutional neural networks-based classifier using less than a quarter of the number of images, with CatBoost having the highest accuracy. This was mainly due to the superior accuracy of gradient boosting methods in classifying Fanaroff-Riley class II sources, with 3$\unicode{x2013}$4% higher recall. △ Less

Submitted 3 August, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: The peer-reviewed paper was presented at The 2023 International Joint Conference on Neural Networks (IJCNN) and published on IEEE Xplore. The code and dataset used in this work are available from https://github.com/AbdollahMasoud/IJCNN-2023

arXiv:2303.04942 [pdf, other]

A Study of Variable-Role-based Feature Enrichment in Neural Models of Code

Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Bowen Xu, David Lo, Mohammad Amin Alipour

Abstract: Although deep neural models substantially reduce the overhead of feature engineering, the features readily available in the inputs might significantly impact training cost and the performance of the models. In this paper, we explore the impact of an unsuperivsed feature enrichment approach based on variable roles on the performance of neural models of code. The notion of variable roles (as introdu… ▽ More Although deep neural models substantially reduce the overhead of feature engineering, the features readily available in the inputs might significantly impact training cost and the performance of the models. In this paper, we explore the impact of an unsuperivsed feature enrichment approach based on variable roles on the performance of neural models of code. The notion of variable roles (as introduced in the works of Sajaniemi et al. [Refs. 1,2]) has been found to help students' abilities in programming. In this paper, we investigate if this notion would improve the performance of neural models of code. To the best of our knowledge, this is the first work to investigate how Sajaniemi et al.'s concept of variable roles can affect neural models of code. In particular, we enrich a source code dataset by adding the role of individual variables in the dataset programs, and thereby conduct a study on the impact of variable role enrichment in training the Code2Seq model. In addition, we shed light on some challenges and opportunities in feature enrichment for neural code intelligence models. △ Less

Submitted 12 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted in the 1st International Workshop on Interpretability and Robustness in Neural Software Engineering (InteNSE'23), Co-located with ICSE

arXiv:2303.01739 [pdf, other]

doi 10.1109/InteNSE59150.2023.00005

Study of Distractors in Neural Models of Code

Authors: Md Rafiqul Islam Rabin, Aftab Hussain, Sahil Suneja, Mohammad Amin Alipour

Abstract: Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confid… ▽ More Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In contrast, in this work, we present an inverse perspective of distractor features: features that cast doubt about the prediction by affecting the model's confidence in its prediction. Understanding distractors provide a complementary view of the features' relevance in the predictions of neural models. In this paper, we apply a reduction-based technique to find distractors and provide our preliminary results of their impacts and types. Our experiments across various tasks, models, and datasets of code reveal that the removal of tokens can have a significant impact on the confidence of models in their predictions and the categories of tokens can also play a vital role in the model's confidence. Our study aims to enhance the transparency of models by emphasizing those tokens that significantly influence the confidence of the models. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, Co-located with ICSE (InteNSE'23)

arXiv:2301.10177 [pdf, other]

Co-channel Interference Management for the Next-Generation Heterogeneous Networks using Deep Leaning

Authors: Ishtiaq Ahmad, Aftab Hussain

Abstract: The connectivity of public-safety mobile users (MU) in the co-existence of a public-safety network (PSN), unmanned aerial vehicles (UAVs), and LTE-based railway networks (LRN) needs a thorough investigation. UAVs are deployed as mobile base stations (BSs) for cell-edge coverage enhancement for MU. The co-existence of heterogeneous networks gives rise to the issue of co-channel interference due to… ▽ More The connectivity of public-safety mobile users (MU) in the co-existence of a public-safety network (PSN), unmanned aerial vehicles (UAVs), and LTE-based railway networks (LRN) needs a thorough investigation. UAVs are deployed as mobile base stations (BSs) for cell-edge coverage enhancement for MU. The co-existence of heterogeneous networks gives rise to the issue of co-channel interference due to the utilization of the same frequency band. By considering both sharing and non-sharing of radio access channels (RAC), we analyze co-channel interference in the downlink system of PSN, UAV, and LRN. As the LRN control signal demands high reliability and low latency, we provide higher priority to LRN users when allocating resources from the LRN RAC shared with MUs. Moreover, UAVs are deployed at the cell edge to increase the performance of cell-edge users. Therefore, interference control techniques enable LRN, PSN, and UAVs to cohabit in a scenario of sharing RAC. By offloading more PSN UEs to the LRN or UAVs, the resource utilization of the LRN and UAVs BSs is enhanced. In this paper, we aim to adopt deep learning (DL) based on enhanced inter-cell-interference coordination (eICIC) and further enhanced ICIC (FeICIC) strategies to deal with the interference from the PSN to the LRN and UAVs. Among LRN, PSN BS, and UAVs, a DL-based coordinated multipoint (CoMP) link technique is utilized to enhance the performance of PSN MUs. Therefore, if radio access channels are shared, utilization of DL-based FeICIC and CoMP for coordinated scheduling gives the best performance. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2301.09619 [pdf, other]

Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics

Authors: Aamal Abbas Hussain, Francesco Belardinelli, Georgios Piliouras

Abstract: Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted. To make progress in resolving this problem, we study the dynamics of smooth Q… ▽ More Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted. To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games. Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: Accepted in AAMAS 2023

MSC Class: 93A16; 91A26; 91A68; 58K35 ACM Class: G.3; J.4; F.2.2

arXiv:2301.00031 [pdf]

Predicting the Students Involvements and its Impacts on Learning Outcomes Through Online Education During Covid-19

Authors: Muhammad Nadeem, Faisal Bukhari, Ali Hussain

Abstract: Everybody knows very well about the COVID-19 pandemic, lockdown, and its impacts and effects on every field of life, from childhood to senior citizens, from local to global. The underlying research study focuses on students' involvement in online classes. This paper assesses the effect of the COVID-19 pandemic on the students' participation and involvement during online classes compared to the phy… ▽ More Everybody knows very well about the COVID-19 pandemic, lockdown, and its impacts and effects on every field of life, from childhood to senior citizens, from local to global. The underlying research study focuses on students' involvement in online classes. This paper assesses the effect of the COVID-19 pandemic on the students' participation and involvement during online classes compared to the physical classes, cheating behavior, health effects, and study styles of the students of diverse degrees and age groups. This research study contributes to the real problems and challenges that students faced during online classes during the COVID-19 pandemic. The percentages of the students' responses with different color schemes shown in Fig. 1, Fig. 2, Fig.3(a), Fig.3(b) and Fig.4 are conveying powerful and meaningful insight. These figures and the results given in Table I and Table II indicate that most students are not fully involved during online classes due to technical issues, remote distance, etc. We applied the Test here because we do not have exact population means. We used ttest_1samp with default value 0 to compute the variables' statistics and p-value. These values are minimal in favor of rejecting the null or H0 (hypothesis) and accepting the alternate or H1 (hypothesis). It further means that students' involvement during online classes is severely affected. △ Less

Submitted 28 December, 2022; originally announced January 2023.

Comments: 10 pages, 4

arXiv:2212.06682 [pdf, other]

Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation

Authors: Chaolong Yang, Yuyao Yan, Weiguang Zhao, Jianan Ye, Xi Yang, Amir Hussain, Kaizhu Huang

Abstract: 3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-th… ▽ More 3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2211.01950 [pdf, other]

Unlocking the potential of two-point cells for energy-efficient and resilient training of deep nets

Authors: Ahsan Adeel, Adewale Adetomi, Khubaib Ahmed, Amir Hussain, Tughrul Arslan, W. A. Phillips

Abstract: Context-sensitive two-point layer 5 pyramidal cells (L5PCs) were discovered as long ago as 1999. However, the potential of this discovery to provide useful neural computation has yet to be demonstrated. Here we show for the first time how a transformative L5PCs-driven deep neural network (DNN), termed the multisensory cooperative computing (MCC) architecture, can effectively process large amounts… ▽ More Context-sensitive two-point layer 5 pyramidal cells (L5PCs) were discovered as long ago as 1999. However, the potential of this discovery to provide useful neural computation has yet to be demonstrated. Here we show for the first time how a transformative L5PCs-driven deep neural network (DNN), termed the multisensory cooperative computing (MCC) architecture, can effectively process large amounts of heterogeneous real-world audio-visual (AV) data, using far less energy compared to best available 'point' neuron-driven DNNs. A novel highly-distributed parallel implementation on a Xilinx UltraScale+ MPSoC device estimates energy savings up to 245759 $ \times $ 50000 $μ$J (i.e., 62% less than the baseline model in a semi-supervised learning setup) where a single synapse consumes $8e^{-5}μ$J. In a supervised learning setup, the energy-saving can potentially reach up to 1250x less (per feedforward transmission) than the baseline model. The significantly reduced neural activity in MCC leads to inherently fast learning and resilience against sudden neural damage. This remarkable performance in pilot experiments demonstrates the embodied neuromorphic intelligence of our proposed cooperative L5PC that receives input from diverse neighbouring neurons as context to amplify the transmission of most salient and relevant information for onward transmission, from overwhelmingly large multimodal information utilised at the early stages of on-chip training. Our proposed approach opens new cross-disciplinary avenues for future on-chip DNN training implementations and posits a radical shift in current neuromorphic computing paradigms. △ Less

Submitted 22 December, 2022; v1 submitted 24 October, 2022; originally announced November 2022.

arXiv:2210.17456 [pdf, other]

Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings

Authors: I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou

Abstract: AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-moda… ▽ More AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-modal AV regression tasks, such as audio-visual speech enhancement (AVSE) and audio-visual speech separation (AVSS). In this study, we leveraged the pre-trained AV-HuBERT model followed by an SE module for AVSE and AVSS. Comparative experimental results demonstrate that our proposed model performs better than the state-of-the-art AVSE and traditional audio-only SE models. In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HuBERT can be generalized to audio-visual regression tasks. △ Less

Submitted 31 May, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: ICASSP AMHAT 2023

arXiv:2210.13127 [pdf, other]

A Novel Frame Structure for Cloud-Based Audio-Visual Speech Enhancement in Multimodal Hearing-aids

Authors: Abhijeet Bishnu, Ankit Gupta, Mandar Gogate, Kia Dashtipour, Ahsan Adeel, Amir Hussain, Mathini Sellathurai, Tharmalingam Ratnarajah

Abstract: In this paper, we design a first of its kind transceiver (PHY layer) prototype for cloud-based audio-visual (AV) speech enhancement (SE) complying with high data rate and low latency requirements of future multimodal hearing assistive technology. The innovative design needs to meet multiple challenging constraints including up/down link communications, delay of transmission and signal processing,… ▽ More In this paper, we design a first of its kind transceiver (PHY layer) prototype for cloud-based audio-visual (AV) speech enhancement (SE) complying with high data rate and low latency requirements of future multimodal hearing assistive technology. The innovative design needs to meet multiple challenging constraints including up/down link communications, delay of transmission and signal processing, and real-time AV SE models processing. The transceiver includes device detection, frame detection, frequency offset estimation, and channel estimation capabilities. We develop both uplink (hearing aid to the cloud) and downlink (cloud to hearing aid) frame structures based on the data rate and latency requirements. Due to the varying nature of uplink information (audio and lip-reading), the uplink channel supports multiple data rate frame structure, while the downlink channel has a fixed data rate frame structure. In addition, we evaluate the latency of different PHY layer blocks of the transceiver for developed frame structures using LabVIEW NXG. This can be used with software defined radio (such as Universal Software Radio Peripheral) for real-time demonstration scenarios. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2210.04252 [pdf, other]

Precise Single-stage Detector

Authors: Aisha Chandio, Gong Gui, Teerath Kumar, Irfan Ullah, Ramin Ranjbarzadeh, Arunabha M Roy, Akhtar Hussain, Yao Shen

Abstract: There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confide… ▽ More There are still two problems in SDD causing some inaccurate results: (1) In the process of feature extraction, with the layer-by-layer acquisition of semantic information, local information is gradually lost, resulting into less representative feature maps; (2) During the Non-Maximum Suppression (NMS) algorithm due to inconsistency in classification and regression tasks, the classification confidence and predicted detection position cannot accurately indicate the position of the prediction boxes. Methods: In order to address these aforementioned issues, we propose a new architecture, a modified version of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector (PSSD). Firstly, we improve the features by adding extra layers to SSD. Secondly, we construct a simple and effective feature enhancement module to expand the receptive field step by step for each layer and enhance its local and semantic information. Finally, we design a more efficient loss function to predict the IOU between the prediction boxes and ground truth boxes, and the threshold IOU guides classification training and attenuates the scores, which are used by the NMS algorithm. Main Results: Benefiting from the above optimization, the proposed model PSSD achieves exciting performance in real-time. Specifically, with the hardware of Titan Xp and the input size of 320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28 mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object detection models. Besides, the proposed model performs significantly well with larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove that the proposed model has a better trade-off between speed and accuracy. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: We will submit it soon to the IEEE transaction. Due to characters limitation, we can not upload the full abstract. Please read the pdf file for more detail

arXiv:2209.13101 [pdf, other]

WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs

Authors: Hoang Thang Ta, Abu Bakar Siddiqur Rahman, Navonil Majumder, Amir Hussain, Lotfollah Najjar, Newton Howard, Soujanya Poria, Alexander Gelbukh

Abstract: As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of Wikipedia articles for the problem of text summar… ▽ More As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of Wikipedia articles for the problem of text summarization. The dataset consists of over 80k English samples on 6987 topics. We set up a two-phase summarization method - description generation (Phase I) and candidate ranking (Phase II) - as a strong approach that relies on transfer and contrastive learning. For description generation, T5 and BART show their superiority compared to other small-scale pre-trained models. By applying contrastive learning with the diverse input from beam search, the metric fusion-based ranking models outperform the direct description generation models significantly up to 22 ROUGE in topic-exclusive split and topic-independent split. Furthermore, the outcome descriptions in Phase II are supported by human evaluation in over 45.33% chosen compared to 23.66% in Phase I against the gold descriptions. In the aspect of sentiment analysis, the generated descriptions cannot effectively capture all sentiment polarities from paragraphs while doing this task better from the gold descriptions. The automatic generation of new descriptions reduces the human efforts in creating them and enriches Wikidata-based knowledge graphs. Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions. Finally, we expect WikiDes to be a useful dataset for related works in capturing salient information from short paragraphs. The curated dataset is publicly available at: https://github.com/declare-lab/WikiDes. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: 27 pages, 8 figures, 15 tables

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.02671 [pdf, ps, other]

doi 10.1016/j.neucom.2022.11.081

Canonical Cortical Graph Neural Networks and its Application for Speech Enhancement in Audio-Visual Hearing Aids

Authors: Leandro A. Passos, João Paulo Papa, Amir Hussain, Ahsan Adeel

Abstract: Despite the recent success of machine learning algorithms, most models face drawbacks when considering more complex tasks requiring interaction between different sources, such as multimodal input data and logical time sequences. On the other hand, the biological brain is highly sharpened in this sense, empowered to automatically manage and integrate such streams of information. In this context, th… ▽ More Despite the recent success of machine learning algorithms, most models face drawbacks when considering more complex tasks requiring interaction between different sources, such as multimodal input data and logical time sequences. On the other hand, the biological brain is highly sharpened in this sense, empowered to automatically manage and integrate such streams of information. In this context, this work draws inspiration from recent discoveries in brain cortical circuits to propose a more biologically plausible self-supervised machine learning approach. This combines multimodal information using intra-layer modulations together with Canonical Correlation Analysis, and a memory mechanism to keep track of temporal data, the overall approach termed Canonical Cortical Graph Neural networks. This is shown to outperform recent state-of-the-art models in terms of clean audio reconstruction and energy efficiency for a benchmark audio-visual speech dataset. The enhanced performance is demonstrated through a reduced and smother neuron firing rate distribution. suggesting that the proposed model is amenable for speech enhancement in future audio-visual hearing aid devices. △ Less

Submitted 31 January, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.14374 [pdf, other]

doi 10.1145/3520312.3534869

Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Authors: Md Rafiqul Islam Rabin, Aftab Hussain, Mohammad Amin Alipour

Abstract: Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI mod… ▽ More Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical applications. Recently, input program reduction techniques have been proposed to identify key features in the input programs to improve the transparency of CI models. However, this approach is syntax-unaware and does not consider the grammar of the programming language. In this paper, we apply a syntax-guided program reduction technique that considers the grammar of the input programs during reduction. Our experiments on multiple models across different types of input programs show that the syntax-guided program reduction technique is faster and provides smaller sets of key tokens in reduced programs. We also show that the key tokens could be used in generating adversarial examples for up to 65% of the input programs. △ Less

Submitted 14 June, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: The 6th ACM SIGPLAN International Symposium on Machine Programming (MAPS'22); Related to arXiv:2202.06474

arXiv:2205.09239 [pdf, other]

Readle: A Formal Framework for Designing AI-based Edge Systems

Authors: Aftab Hussain

Abstract: With the wide spread use of AI-driven systems in the edge (a.k.a edge intelligence systems), such as autonomous driving vehicles, wearable biotech devices, intelligent manufacturing, etc., such systems are becoming very critical for our day-to-day lives. A challenge in designing edge intelligence systems is that we have to deal with a large number of constraints in two design spaces that form the… ▽ More With the wide spread use of AI-driven systems in the edge (a.k.a edge intelligence systems), such as autonomous driving vehicles, wearable biotech devices, intelligent manufacturing, etc., such systems are becoming very critical for our day-to-day lives. A challenge in designing edge intelligence systems is that we have to deal with a large number of constraints in two design spaces that form the basis of such systems: the edge design space and the deep learning design space. Thus in this work, a new systematic, extendable, manual approach, READLE, is proposed for creating representations of specifications in edge intelligent systems, capturing constraints in the edge system design space (e.g. timing constraints and other performance constraints) and constraints in the deep learning space (e.g. model training duration, required level of accuracy) in a coherent fashion. In particular, READLE leverages benefits of real-time logic and binary decision diagrams to generate unified specifications. Several insights learned in building READLE are also discussed, which should help in future research in the domain of formal specifications for edge intelligent systems. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2204.03842 [pdf, other]

From 2D Images to 3D Model:Weakly Supervised Multi-View Face Reconstruction with Deep Fusion

Authors: Weiguang Zhao, Chaolong Yang, Jianan Ye, Rui Zhang, Yuyao Yan, Xi Yang, Bin Dong, Amir Hussain, Kaizhu Huang

Abstract: While weakly supervised multi-view face reconstruction (MVR) is garnering increased attention, one critical issue still remains open: how to effectively fuse multiple image information to reconstruct high-precision 3D models. In this regard, we propose a novel model called Deep Fusion MVR (DF-MVR) to reconstruct high-precision 3D facial shapes from multi-view images. Specifically, we introduce Mul… ▽ More While weakly supervised multi-view face reconstruction (MVR) is garnering increased attention, one critical issue still remains open: how to effectively fuse multiple image information to reconstruct high-precision 3D models. In this regard, we propose a novel model called Deep Fusion MVR (DF-MVR) to reconstruct high-precision 3D facial shapes from multi-view images. Specifically, we introduce MulEn-Unet, a multi-view encoding to single decoding framework with skip connections and attention. This design allows for the extraction, integration, and compensation of deep features with attention from multi-view images. Furthermore, we adopt the involution kernel to enrich deep fusion features with channel features. In addition, we develop the face parse network to learn, identify, and emphasize the critical common face area within multi-view images. Experiments on Pixel-Face and Bosphorus datasets indicate the superiority of our model. Without 3D annotation, DF-MVR achieves 5.2% and 3.0% RMSE improvement over the existing weakly supervised MVRs respectively on Pixel-Face and Bosphorus dataset. Code will be available publicly at https://github.com/weiguangzhao/DF_MVR. △ Less

Submitted 22 January, 2024; v1 submitted 8 April, 2022; originally announced April 2022.

arXiv:2202.09115 [pdf, other]

Towards Simple and Accurate Human Pose Estimation with Stair Network

Authors: Chenru Jiang, Kaizhu Huang, Shufei Zhang, Shufei Zhang, Jimin Xiao, Zhenxing Niu, Amir Hussain

Abstract: In this paper, we focus on tackling the precise keypoint coordinates regression task. Most existing approaches adopt complicated networks with a large number of parameters, leading to a heavy model with poor cost-effectiveness in practice. To overcome this limitation, we develop a small yet discrimicative model called STair Network, which can be simply stacked towards an accurate multi-stage pose… ▽ More In this paper, we focus on tackling the precise keypoint coordinates regression task. Most existing approaches adopt complicated networks with a large number of parameters, leading to a heavy model with poor cost-effectiveness in practice. To overcome this limitation, we develop a small yet discrimicative model called STair Network, which can be simply stacked towards an accurate multi-stage pose estimation system. Specifically, to reduce computational cost, STair Network is composed of novel basic feature extraction blocks which focus on promoting feature diversity and obtaining rich local representations with fewer parameters, enabling a satisfactory balance on efficiency and performance. To further improve the performance, we introduce two mechanisms with negligible computational cost, focusing on feature fusion and replenish. We demonstrate the effectiveness of the STair Network on two standard datasets, e.g., 1-stage STair Network achieves a higher accuracy than HRNet by 5.5% on COCO test dataset with 80\% fewer parameters and 68% fewer GFLOPs. △ Less

Submitted 21 November, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

Comments: The paper has been accepted by IEEE Transactions on Emerging Topics in Computational Intelligence

Showing 1–50 of 135 results for author: Hussain, A