subscribe to arXiv mailings

Applicability of Large Language Models and Generative Models for Legal Case Judgement Summarization

Authors: Aniket Deroy, Kripabandhu Ghosh, Saptarshi Ghosh

Abstract: Automatic summarization of legal case judgements, which are known to be long and complex, has traditionally been tried via extractive summarization models. In recent years, generative models including abstractive summarization models and Large language models (LLMs) have gained huge popularity. In this paper, we explore the applicability of such models for legal case judgement summarization. We ap… ▽ More Automatic summarization of legal case judgements, which are known to be long and complex, has traditionally been tried via extractive summarization models. In recent years, generative models including abstractive summarization models and Large language models (LLMs) have gained huge popularity. In this paper, we explore the applicability of such models for legal case judgement summarization. We applied various domain specific abstractive summarization models and general domain LLMs as well as extractive summarization models over two sets of legal case judgements from the United Kingdom (UK) Supreme Court and the Indian (IN) Supreme Court and evaluated the quality of the generated summaries. We also perform experiments on a third dataset of legal documents of a different type, Government reports from the United States (US). Results show that abstractive summarization models and LLMs generally perform better than the extractive methods as per traditional metrics for evaluating summary quality. However, detailed investigation shows the presence of inconsistencies and hallucinations in the outputs of the generative models, and we explore ways to reduce the hallucinations and inconsistencies in the summaries. Overall, the investigation suggests that further improvements are needed to enhance the reliability of abstractive models and LLMs for legal case judgement summarization. At present, a human-in-the-loop technique is more suitable for performing manual checks to identify inconsistencies in the generated summaries. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: Accepted at Artificial Intelligence and Law, 2024

arXiv:2407.11268 [pdf, other]

Heterogenous Multi-Source Data Fusion Through Input Mapping and Latent Variable Gaussian Process

Authors: Yigitcan Comlek, Sandipp Krishnan Ravi, Piyush Pandita, Sayan Ghosh, Liping Wang, Wei Chen

Abstract: Artificial intelligence and machine learning frameworks have served as computationally efficient mapping between inputs and outputs for engineering problems. These mappings have enabled optimization and analysis routines that have warranted superior designs, ingenious material systems and optimized manufacturing processes. A common occurrence in such modeling endeavors is the existence of multiple… ▽ More Artificial intelligence and machine learning frameworks have served as computationally efficient mapping between inputs and outputs for engineering problems. These mappings have enabled optimization and analysis routines that have warranted superior designs, ingenious material systems and optimized manufacturing processes. A common occurrence in such modeling endeavors is the existence of multiple source of data, each differentiated by fidelity, operating conditions, experimental conditions, and more. Data fusion frameworks have opened the possibility of combining such differentiated sources into single unified models, enabling improved accuracy and knowledge transfer. However, these frameworks encounter limitations when the different sources are heterogeneous in nature, i.e., not sharing the same input parameter space. These heterogeneous input scenarios can occur when the domains differentiated by complexity, scale, and fidelity require different parametrizations. Towards addressing this void, a heterogeneous multi-source data fusion framework is proposed based on input mapping calibration (IMC) and latent variable Gaussian process (LVGP). In the first stage, the IMC algorithm is utilized to transform the heterogeneous input parameter spaces into a unified reference parameter space. In the second stage, a multi-source data fusion model enabled by LVGP is leveraged to build a single source-aware surrogate model on the transformed reference space. The proposed framework is demonstrated and analyzed on three engineering case studies (design of cantilever beam, design of ellipsoidal void and modeling properties of Ti6Al4V alloy). The results indicate that the proposed framework provides improved predictive accuracy over a single source model and transformed but source unaware model. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 20 Pages,9 Figures, Data is available per request

arXiv:2407.07237 [pdf, other]

The Quantum Imitation Game: Reverse Engineering of Quantum Machine Learning Models

Authors: Archisman Ghosh, Swaroop Ghosh

Abstract: Quantum Machine Learning (QML) amalgamates quantum computing paradigms with machine learning models, providing significant prospects for solving complex problems. However, with the expansion of numerous third-party vendors in the Noisy Intermediate-Scale Quantum (NISQ) era of quantum computing, the security of QML models is of prime importance, particularly against reverse engineering, which could… ▽ More Quantum Machine Learning (QML) amalgamates quantum computing paradigms with machine learning models, providing significant prospects for solving complex problems. However, with the expansion of numerous third-party vendors in the Noisy Intermediate-Scale Quantum (NISQ) era of quantum computing, the security of QML models is of prime importance, particularly against reverse engineering, which could expose trained parameters and algorithms of the models. We assume the untrusted quantum cloud provider is an adversary having white-box access to the transpiled user-designed trained QML model during inference. Reverse engineering (RE) to extract the pre-transpiled QML circuit will enable re-transpilation and usage of the model for various hardware with completely different native gate sets and even different qubit technology. Such flexibility may not be obtained from the transpiled circuit which is tied to a particular hardware and qubit technology. The information about the number of parameters, and optimized values can allow further training of the QML model to alter the QML model, tamper with the watermark, and/or embed their own watermark or refine the model for other purposes. In this first effort to investigate the RE of QML circuits, we perform RE and compare the training accuracy of original and reverse-engineered Quantum Neural Networks (QNNs) of various sizes. We note that multi-qubit classifiers can be reverse-engineered under specific conditions with a mean error of order 1e-2 in a reasonable time. We also propose adding dummy fixed parametric gates in the QML models to increase the RE overhead for defense. For instance, adding 2 dummy qubits and 2 layers increases the overhead by ~1.76 times for a classifier with 2 qubits and 3 layers with a performance overhead of less than 9%. We note that RE is a very powerful attack model which warrants further efforts on defenses. △ Less

Submitted 15 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: 11 pages, 12 figures

arXiv:2407.06323 [pdf, ps, other]

When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails

Authors: Manish Nagireddy, Inkit Padhi, Soumya Ghosh, Prasanna Sattigeri

Abstract: Large language models (LLMs) have convincing performance in a variety of downstream tasks. However, these systems are prone to generating undesirable outputs such as harmful and biased text. In order to remedy such generations, the development of guardrail (or detector) models has gained traction. Motivated by findings from developing a detector for social bias, we adopt the notion of a use-mentio… ▽ More Large language models (LLMs) have convincing performance in a variety of downstream tasks. However, these systems are prone to generating undesirable outputs such as harmful and biased text. In order to remedy such generations, the development of guardrail (or detector) models has gained traction. Motivated by findings from developing a detector for social bias, we adopt the notion of a use-mention distinction - which we identified as the primary source of under-performance in the preliminary versions of our social bias detector. Armed with this information, we describe a fully extensible and reproducible synthetic data generation pipeline which leverages taxonomy-driven instructions to create targeted and labeled data. Using this pipeline, we generate over 300K unique contrastive samples and provide extensive experiments to systematically evaluate performance on a suite of open source datasets. We show that our method achieves competitive performance with a fraction of the cost in compute and offers insight into iteratively developing efficient and capable guardrail models. Warning: This paper contains examples of text which are toxic, biased, and potentially harmful. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05901 [pdf, other]

Intelligent Routing as a Service (iRaaS)

Authors: Saptarshi Ghosh, Konstantinos Antonakoglou, Ioannis Mavromatis, Kostas Katsaros

Abstract: The scope of the Sixth-Generation Self-Organized Networks (6G-SON) advances its predecessor's capability towards agility, flexibility, and adaptability. On-demand overlay networking technologies have shown a prominent maturity while coping with the rising complexity and scale of enterprise, service provider, and data centre networks. The Software-Defined Networking paradigm has recently offered Mo… ▽ More The scope of the Sixth-Generation Self-Organized Networks (6G-SON) advances its predecessor's capability towards agility, flexibility, and adaptability. On-demand overlay networking technologies have shown a prominent maturity while coping with the rising complexity and scale of enterprise, service provider, and data centre networks. The Software-Defined Networking paradigm has recently offered Model Driven Programmability, minimizing network management complexity through automation and orchestration. However, leveraging Machine Learning-driven network optimization, a.k.a. Knowledge-Defined Networking (KDN), has still been a domain of interest for the Network Softwarization research community. In this article, we propose Intelligent Routing as a Service (iRaaS) architecture as an application layer cognitive routing framework for KDNs. iRaaS offers routing logic customization (i.e., customizing metric function and path-finding algorithm) and provides an option to include heuristic parameters from trained models as a part of the metric calculation. iRaaS sits on the application plane above the knowledge plane in a KDN stack, thus providing platform- and vendor-agnostic coupling with existing network infrastructures. This article covers the scope of iRaaS by using reliability as a heuristic for standard path-discovery algorithms, e.g., Shortest Path First (SPF) and Diffusion Update algorithm (DUAL), along with the architectural specification. We validate our approach through a Proof-of-Concept deployment. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05399 [pdf, other]

IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning

Authors: Abhinav Joshi, Shounak Paul, Akshat Sharma, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi

Abstract: Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing… ▽ More Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning. IL-TUR contains monolingual (English, Hindi) and multi-lingual (9 Indian languages) domain-specific tasks that address different aspects of the legal system from the point of view of understanding and reasoning over Indian legal documents. We present baseline models (including LLM-based) for each task, outlining the gap between models and the ground truth. To foster further research in the legal domain, we create a leaderboard (available at: https://exploration-lab.github.io/IL-TUR/) where the research community can upload and compare legal text understanding systems. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Accepted at ACL 2024 Main Conference; 40 Pages (9 Pages + References + Appendix)

arXiv:2407.03835 [pdf, other]

7th ABAW Competition: Multi-Task Learning and Compound Expression Recognition

Authors: Dimitrios Kollias, Stefanos Zafeiriou, Irene Kotsia, Abhinav Dhall, Shreya Ghosh, Chunchang Shao, Guanyu Hu

Abstract: This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning… ▽ More This paper describes the 7th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with ECCV 2024. The 7th ABAW Competition addresses novel challenges in understanding human expressions and behaviors, crucial for the development of human-centered technologies. The Competition comprises of two sub-challenges: i) Multi-Task Learning (the goal is to learn at the same time, in a multi-task learning setting, to estimate two continuous affect dimensions, valence and arousal, to recognise between the mutually exclusive classes of the 7 basic expressions and 'other'), and to detect 12 Action Units); and ii) Compound Expression Recognition (the target is to recognise between the 7 mutually exclusive compound expression classes). s-Aff-Wild2, which is a static version of the A/V Aff-Wild2 database and contains annotations for valence-arousal, expressions and Action Units, is utilized for the purposes of the Multi-Task Learning Challenge; a part of C-EXPR-DB, which is an A/V in-the-wild database with compound expression annotations, is utilized for the purposes of the Compound Expression Recognition Challenge. In this paper, we introduce the two challenges, detailing their datasets and the protocols followed for each. We also outline the evaluation metrics, and highlight the baseline systems and their results. Additional information about the competition can be found at \url{https://affective-behavior-analysis-in-the-wild.github.io/7th}. △ Less

Submitted 8 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02658 [pdf, other]

Efficient Exact Algorithms for Minimum Covering of Orthogonal Polygons with Squares

Authors: Anubhav Dhar, Subham Ghosh, Sudeshna Kolay

Abstract: The Orthogonal Polygon Covering with Squares (OPCS) problem takes as input an orthogonal polygon $P$ without holes with $n$ vertices, where vertices have integral coordinates. The aim is to find a minimum number of axis-parallel, possibly overlapping squares which lie completely inside $P$, such that their union covers the entire region inside $P$. Aupperle et. al~\cite{aupperle1988covering} provi… ▽ More The Orthogonal Polygon Covering with Squares (OPCS) problem takes as input an orthogonal polygon $P$ without holes with $n$ vertices, where vertices have integral coordinates. The aim is to find a minimum number of axis-parallel, possibly overlapping squares which lie completely inside $P$, such that their union covers the entire region inside $P$. Aupperle et. al~\cite{aupperle1988covering} provide an $\mathcal O(N^{1.5})$-time algorithm to solve OPCS for orthogonal polygons without holes, where $N$ is the number of integral lattice points lying in the interior or on the boundary of $P$. Designing algorithms for OPCS with a running time polynomial in $n$ (the number of vertices of $P$) was discussed as an open question in \cite{aupperle1988covering}, since $N$ can be exponentially larger than $n$. In this paper we design a polynomial-time exact algorithm for OPCS with a running time of $\mathcal O(n^{14})$. We also consider the following structural parameterized version of the problem. A knob in an orthogonal polygon is a polygon edge whose both endpoints are convex polygon vertices. Given an input orthogonal polygon with $n$ vertices and $k$ knobs, we design an algorithm for OPCS with running time $\mathcal O(n^2 + k^{14} \cdot n)$. In \cite{aupperle1988covering}, the Orthogonal Polygon with Holes Covering with Squares (OPCSH) problem is also studied where orthogonal polygon could have holes, and the objective is to find a minimum square covering of the input polygon. This is shown to be NP-complete. We think there is an error in the existing proof in \cite{aupperle1988covering}, where a reduction from Planar 3-CNF is shown. We fix this error in the proof with an alternate construction of one of the gadgets used in the reduction, hence completing the proof of NP-completeness of OPCSH. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02536 [pdf, other]

doi 10.4230/LIPIcs.GIScience.2023.3

Reducing False Discoveries in Statistically-Significant Regional-Colocation Mining: A Summary of Results

Authors: Subhankar Ghosh, Jayant Gupta, Arun Sharma, Shuai An, Shashi Shekhar

Abstract: Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The prob… ▽ More Given a set \emph{S} of spatial feature types, its feature instances, a study area, and a neighbor relationship, the goal is to find pairs $<$a region ($r_{g}$), a subset \emph{C} of \emph{S}$>$ such that \emph{C} is a statistically significant regional-colocation pattern in $r_{g}$. This problem is important for applications in various domains including ecology, economics, and sociology. The problem is computationally challenging due to the exponential number of regional colocation patterns and candidate regions. Previously, we proposed a miner \cite{10.1145/3557989.3566158} that finds statistically significant regional colocation patterns. However, the numerous simultaneous statistical inferences raise the risk of false discoveries (also known as the multiple comparisons problem) and carry a high computational cost. We propose a novel algorithm, namely, multiple comparisons regional colocation miner (MultComp-RCM) which uses a Bonferroni correction. Theoretical analysis, experimental evaluation, and case study results show that the proposed method reduces both the false discovery rate and computational cost. △ Less

Submitted 1 July, 2024; originally announced July 2024.

ACM Class: E.m; F.2; E.1; H.3; I.5; J.0

arXiv:2407.01878 [pdf, other]

Compare without Despair: Reliable Preference Evaluation with Generation Separability

Authors: Sayan Ghosh, Tejas Srinivasan, Swabha Swayamdipta

Abstract: Human evaluation of generated language through pairwise preference judgments is pervasive. However, under common scenarios, such as when generations from a model pair are very similar, or when stochastic decoding results in large variations in generations, it results in inconsistent preference ratings. We address these challenges by introducing a meta-evaluation measure, separability, which estima… ▽ More Human evaluation of generated language through pairwise preference judgments is pervasive. However, under common scenarios, such as when generations from a model pair are very similar, or when stochastic decoding results in large variations in generations, it results in inconsistent preference ratings. We address these challenges by introducing a meta-evaluation measure, separability, which estimates how suitable a test instance is for pairwise preference evaluation. For a candidate test instance, separability samples multiple generations from a pair of models, and measures how distinguishable the two sets of generations are. Our experiments show that instances with high separability values yield more consistent preference ratings from both human- and auto-raters. Further, the distribution of separability allows insights into which test benchmarks are more valuable for comparing models. Finally, we incorporate separability into ELO ratings, accounting for how suitable each test instance might be for reliably ranking LLMs. Overall, separability has implications for consistent, efficient and robust preference evaluation of LLMs with both human- and auto-raters. △ Less

Submitted 8 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

Comments: Corrected description of reference in Related Work

arXiv:2407.01732 [pdf, other]

Investigating Nudges toward Related Sellers on E-commerce Marketplaces: A Case Study on Amazon

Authors: Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Krishna P. Gummadi

Abstract: E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or shipping services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the mark… ▽ More E-commerce marketplaces provide business opportunities to millions of sellers worldwide. Some of these sellers have special relationships with the marketplace by virtue of using their subsidiary services (e.g., fulfillment and/or shipping services provided by the marketplace) -- we refer to such sellers collectively as Related Sellers. When multiple sellers offer to sell the same product, the marketplace helps a customer in selecting an offer (by a seller) through (a) a default offer selection algorithm, (b) showing features about each of the offers and the corresponding sellers (price, seller performance metrics, seller's number of ratings etc.), and (c) finally evaluating the sellers along these features. In this paper, we perform an end-to-end investigation into how the above apparatus can nudge customers toward the Related Sellers on Amazon's four different marketplaces in India, USA, Germany and France. We find that given explicit choices, customers' preferred offers and algorithmically selected offers can be significantly different. We highlight that Amazon is adopting different performance metric evaluation policies for different sellers, potentially benefiting Related Sellers. For instance, such policies result in notable discrepancy between the actual performance metric and the presented performance metric of Related Sellers. We further observe that among the seller-centric features visible to customers, sellers' number of ratings influences their decisions the most, yet it may not reflect the true quality of service by the seller, rather reflecting the scale at which the seller operates, thereby implicitly steering customers toward larger Related Sellers. Moreover, when customers are shown the rectified metrics for the different sellers, their preference toward Related Sellers is almost halved. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This work has been accepted for presentation at the ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2024. It will appear in Proceedings of the ACM on Human-Computer Interaction

arXiv:2407.00317 [pdf, other]

Towards Statistically Significant Taxonomy Aware Co-location Pattern Detection

Authors: Subhankar Ghosh, Arun Sharma, Jayant Gupta, Shashi Shekhar

Abstract: Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across… ▽ More Given a collection of Boolean spatial feature types, their instances, a neighborhood relation (e.g., proximity), and a hierarchical taxonomy of the feature types, the goal is to find the subsets of feature types or their parents whose spatial interaction is statistically significant. This problem is for taxonomy-reliant applications such as ecology (e.g., finding new symbiotic relationships across the food chain), spatial pathology (e.g., immunotherapy for cancer), retail, etc. The problem is computationally challenging due to the exponential number of candidate co-location patterns generated by the taxonomy. Most approaches for co-location pattern detection overlook the hierarchical relationships among spatial features, and the statistical significance of the detected patterns is not always considered, leading to potential false discoveries. This paper introduces two methods for incorporating taxonomies and assessing the statistical significance of co-location patterns. The baseline approach iteratively checks the significance of co-locations between leaf nodes or their ancestors in the taxonomy. Using the Benjamini-Hochberg procedure, an advanced approach is proposed to control the false discovery rate. This approach effectively reduces the risk of false discoveries while maintaining the power to detect true co-location patterns. Experimental evaluation and case study results show the effectiveness of the approach. △ Less

Submitted 4 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: Accepted in The 16th Conference on Spatial Information Theory (COSIT) 2024

ACM Class: E.m; H.3.3; I.5; J.4; J.4

arXiv:2406.17957 [pdf, other]

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment

Authors: Paarth Neekhara, Shehzeen Hussain, Subhankar Ghosh, Jason Li, Rafael Valle, Rohan Badlani, Boris Ginsburg

Abstract: Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text c… ▽ More Large Language Model (LLM) based text-to-speech (TTS) systems have demonstrated remarkable capabilities in handling large speech datasets and generating natural speech for new speakers. However, LLM-based TTS models are not robust as the generated output can contain repeating words, missing words and mis-aligned speech (referred to as hallucinations or attention errors), especially when the text contains multiple occurrences of the same token. We examine these challenges in an encoder-decoder transformer model and find that certain cross-attention heads in such models implicitly learn the text and speech alignment when trained for predicting speech tokens for a given text. To make the alignment more robust, we propose techniques utilizing CTC loss and attention priors that encourage monotonic cross-attention over the text tokens. Our guided attention training technique does not introduce any new learnable parameters and significantly improves robustness of LLM-based TTS models. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Published as a conference paper at INTERSPEECH 2024

arXiv:2406.11768 [pdf, other]

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Authors: Sreyan Ghosh, Sonal Kumar, Ashish Seth, Chandra Kiran Reddy Evuru, Utkarsh Tyagi, S Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

Abstract: Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced Audio Understanding and Complex Reasoning Abilities. We build GAMA by integrating an LLM with multiple types of audio representations, including feat… ▽ More Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced Audio Understanding and Complex Reasoning Abilities. We build GAMA by integrating an LLM with multiple types of audio representations, including features from a custom Audio Q-Former, a multi-layer aggregator that aggregates features from multiple layers of an audio encoder. We fine-tune GAMA on a large-scale audio-language dataset, which augments it with audio understanding capabilities. Next, we propose CompA-R (Instruction-Tuning for Complex Audio Reasoning), a synthetically generated instruction-tuning (IT) dataset with instructions that require the model to perform complex reasoning on the input audio. We instruction-tune GAMA with CompA-R to endow it with complex reasoning abilities, where we further add a soft prompt as input with high-level semantic evidence by leveraging event tags of the input audio. Finally, we also propose CompA-R-test, a human-labeled evaluation dataset for evaluating the capabilities of LALMs on open-ended audio question-answering that requires complex reasoning. Through automated and expert human evaluations, we show that GAMA outperforms all other LALMs in literature on diverse audio understanding tasks by margins of 1%-84%. Further, GAMA IT-ed on CompA-R proves to be superior in its complex reasoning and instruction following capabilities. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Project Website: https://sreyan88.github.io/gamaaudio/

arXiv:2406.11704 [pdf, other]

Nemotron-4 340B Technical Report

Authors: Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek , et al. (58 additional authors not shown)

Abstract: We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation be… ▽ More We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.06818 [pdf, other]

Conformal Prediction for Class-wise Coverage via Augmented Label Rank Calibration

Authors: Yuanjie Shi, Subhankar Ghosh, Taha Belkhouja, Janardhan Rao Doppa, Yan Yan

Abstract: Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerba… ▽ More Conformal prediction (CP) is an emerging uncertainty quantification framework that allows us to construct a prediction set to cover the true label with a pre-specified marginal or conditional probability. Although the valid coverage guarantee has been extensively studied for classification problems, CP often produces large prediction sets which may not be practically useful. This issue is exacerbated for the setting of class-conditional coverage on imbalanced classification tasks. This paper proposes the Rank Calibrated Class-conditional CP (RC3P) algorithm to reduce the prediction set sizes to achieve class-conditional coverage, where the valid coverage holds for each class. In contrast to the standard class-conditional CP (CCP) method that uniformly thresholds the class-wise conformity score for each class, the augmented label rank calibration step allows RC3P to selectively iterate this class-wise thresholding subroutine only for a subset of classes whose class-wise top-k error is small. We prove that agnostic to the classifier and data distribution, RC3P achieves class-wise coverage. We also show that RC3P reduces the size of prediction sets compared to the CCP method. Comprehensive experiments on multiple real-world datasets demonstrate that RC3P achieves class-wise coverage and 26.25% reduction in prediction set sizes on average. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.05348 [pdf, other]

Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets

Authors: Satanu Ghosh, Neal R. Brodnik, Carolina Frey, Collin Holgate, Tresa M. Pollock, Samantha Daly, Samuel Carton

Abstract: We explore the ability of GPT-4 to perform ad-hoc schema based information extraction from scientific literature. We assess specifically whether it can, with a basic prompting approach, replicate two existing material science datasets, given the manuscripts from which they were originally manually extracted. We employ materials scientists to perform a detailed manual error analysis to assess where… ▽ More We explore the ability of GPT-4 to perform ad-hoc schema based information extraction from scientific literature. We assess specifically whether it can, with a basic prompting approach, replicate two existing material science datasets, given the manuscripts from which they were originally manually extracted. We employ materials scientists to perform a detailed manual error analysis to assess where the model struggles to faithfully extract the desired information, and draw on their insights to suggest research directions to address this broadly important task. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.04432 [pdf, other]

LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Authors: Sreyan Ghosh, Sonal Kumar, Ashish Seth, Purva Chiniya, Utkarsh Tyagi, Ramani Duraiswami, Dinesh Manocha

Abstract: Visual cues, like lip motion, have been shown to improve the performance of Automatic Speech Recognition (ASR) systems in noisy environments. We propose LipGER (Lip Motion aided Generative Error Correction), a novel framework for leveraging visual cues for noise-robust ASR. Instead of learning the cross-modal correlation between the audio and visual modalities, we make an LLM learn the task of vis… ▽ More Visual cues, like lip motion, have been shown to improve the performance of Automatic Speech Recognition (ASR) systems in noisy environments. We propose LipGER (Lip Motion aided Generative Error Correction), a novel framework for leveraging visual cues for noise-robust ASR. Instead of learning the cross-modal correlation between the audio and visual modalities, we make an LLM learn the task of visually-conditioned (generative) ASR error correction. Specifically, we instruct an LLM to predict the transcription from the N-best hypotheses generated using ASR beam-search. This is further conditioned on lip motions. This approach addresses key challenges in traditional AVSR learning, such as the lack of large-scale paired datasets and difficulties in adapting to new domains. We experiment on 4 datasets in various settings and show that LipGER improves the Word Error Rate in the range of 1.1%-49.2%. We also release LipHyp, a large-scale dataset with hypothesis-transcription pairs that is additionally equipped with lip motion cues to promote further research in this space △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: InterSpeech 2024. Code and Data: https://github.com/Sreyan88/LipGER

arXiv:2406.04370 [pdf, other]

Large Language Model Confidence Estimation via Black-Box Access

Authors: Tejaswini Pedapati, Amit Dhurandhar, Soumya Ghosh, Soham Dan, Prasanna Sattigeri

Abstract: Estimating uncertainty or confidence in the responses of a model can be significant in evaluating trust not only in the responses, but also in the model as a whole. In this paper, we explore the problem of estimating confidence for responses of large language models (LLMs) with simply black-box or query access to them. We propose a simple and extensible framework where, we engineer novel features… ▽ More Estimating uncertainty or confidence in the responses of a model can be significant in evaluating trust not only in the responses, but also in the model as a whole. In this paper, we explore the problem of estimating confidence for responses of large language models (LLMs) with simply black-box or query access to them. We propose a simple and extensible framework where, we engineer novel features and train a (interpretable) model (viz. logistic regression) on these features to estimate the confidence. We empirically demonstrate that our simple framework is effective in estimating confidence of flan-ul2, llama-13b and mistral-7b with it consistently outperforming existing black-box confidence estimation approaches on benchmark datasets such as TriviaQA, SQuAD, CoQA and Natural Questions by even over $10\%$ (on AUROC) in some cases. Additionally, our interpretable approach provides insight into features that are predictive of confidence, leading to the interesting and useful discovery that our confidence models built for one LLM generalize zero-shot across others on a given dataset. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2406.04286 [pdf, other]

ABEX: Data Augmentation for Low-Resource NLU via Expanding Abstract Descriptions

Authors: Sreyan Ghosh, Utkarsh Tyagi, Sonal Kumar, C. K. Evuru, S Ramaneswaran, S Sakshi, Dinesh Manocha

Abstract: We present ABEX, a novel and effective generative data augmentation methodology for low-resource Natural Language Understanding (NLU) tasks. ABEX is based on ABstract-and-EXpand, a novel paradigm for generating diverse forms of an input document -- we first convert a document into its concise, abstract description and then generate new documents based on expanding the resultant abstraction. To lea… ▽ More We present ABEX, a novel and effective generative data augmentation methodology for low-resource Natural Language Understanding (NLU) tasks. ABEX is based on ABstract-and-EXpand, a novel paradigm for generating diverse forms of an input document -- we first convert a document into its concise, abstract description and then generate new documents based on expanding the resultant abstraction. To learn the task of expanding abstract descriptions, we first train BART on a large-scale synthetic dataset with abstract-document pairs. Next, to generate abstract descriptions for a document, we propose a simple, controllable, and training-free method based on editing AMR graphs. ABEX brings the best of both worlds: by expanding from abstract representations, it preserves the original semantic properties of the documents, like style and meaning, thereby maintaining alignment with the original label and data distribution. At the same time, the fundamental process of elaborating on abstract descriptions facilitates diverse generations. We demonstrate the effectiveness of ABEX on 4 NLU tasks spanning 12 datasets and 4 low-resource settings. ABEX outperforms all our baselines qualitatively with improvements of 0.04% - 38.8%. Qualitatively, ABEX outperforms all prior methods from literature in terms of context and length diversity. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ACL 2024 Main Conference. Code and data: https://github.com/Sreyan88/ABEX

arXiv:2406.04245 [pdf, ps, other]

Online learning of a panoply of quantum objects

Authors: Akshay Bansal, Ian George, Soumik Ghosh, Jamie Sikora, Alice Zheng

Abstract: In many quantum tasks, there is an unknown quantum object that one wishes to learn. An online strategy for this task involves adaptively refining a hypothesis to reproduce such an object or its measurement statistics. A common evaluation metric for such a strategy is its regret, or roughly the accumulated errors in hypothesis statistics. We prove a sublinear regret bound for learning over general… ▽ More In many quantum tasks, there is an unknown quantum object that one wishes to learn. An online strategy for this task involves adaptively refining a hypothesis to reproduce such an object or its measurement statistics. A common evaluation metric for such a strategy is its regret, or roughly the accumulated errors in hypothesis statistics. We prove a sublinear regret bound for learning over general subsets of positive semidefinite matrices via the regularized-follow-the-leader algorithm and apply it to various settings where one wishes to learn quantum objects. For concrete applications, we present a sublinear regret bound for learning quantum states, effects, channels, interactive measurements, strategies, co-strategies, and the collection of inner products of pure states. Our bound applies to many other quantum objects with compact, convex representations. In proving our regret bound, we establish various matrix analysis results useful in quantum information theory. This includes a generalization of Pinsker's inequality for arbitrary positive semidefinite operators with possibly different traces, which may be of independent interest and applicable to more general classes of divergences. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 34 pages. Comments welcome

arXiv:2406.03747 [pdf, other]

Instance Segmentation and Teeth Classification in Panoramic X-rays

Authors: Devichand Budagam, Ayush Kumar, Sayan Ghosh, Anuj Shrivastav, Azamat Zhanatuly Imanbayev, Iskander Rafailovich Akhmetov, Dmitrii Kaplun, Sergey Antonov, Artem Rychenkov, Gleb Cyganov, Aleksandr Sinitca

Abstract: Teeth segmentation and recognition are critical in various dental applications and dental diagnosis. Automatic and accurate segmentation approaches have been made possible by integrating deep learning models. Although teeth segmentation has been studied in the past, only some techniques were able to effectively classify and segment teeth simultaneously. This article offers a pipeline of two deep l… ▽ More Teeth segmentation and recognition are critical in various dental applications and dental diagnosis. Automatic and accurate segmentation approaches have been made possible by integrating deep learning models. Although teeth segmentation has been studied in the past, only some techniques were able to effectively classify and segment teeth simultaneously. This article offers a pipeline of two deep learning models, U-Net and YOLOv8, which results in BB-UNet, a new architecture for the classification and segmentation of teeth on panoramic X-rays that is efficient and reliable. We have improved the quality and reliability of teeth segmentation by utilising the YOLOv8 and U-Net capabilities. The proposed networks have been evaluated using the mean average precision (mAP) and dice coefficient for YOLOv8 and BB-UNet, respectively. We have achieved a 3\% increase in mAP score for teeth classification compared to existing methods, and a 10-15\% increase in dice coefficient for teeth segmentation compared to U-Net across different categories of teeth. A new Dental dataset was created based on UFBA-UESC dataset with Bounding-Box and Polygon annotations of 425 dental panoramic X-rays. The findings of this research pave the way for a wider adoption of object detection models in the field of dental diagnosis. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: submtted to Expert Systems with Applications Journal

arXiv:2406.00451 [pdf, other]

Task Planning for Object Rearrangement in Multi-room Environments

Authors: Karan Mirakhor, Sourav Ghosh, Dipanjan Das, Brojeshwar Bhowmick

Abstract: Object rearrangement in a multi-room setup should produce a reasonable plan that reduces the agent's overall travel and the number of steps. Recent state-of-the-art methods fail to produce such plans because they rely on explicit exploration for discovering unseen objects due to partial observability and a heuristic planner to sequence the actions for rearrangement. This paper proposes a novel hie… ▽ More Object rearrangement in a multi-room setup should produce a reasonable plan that reduces the agent's overall travel and the number of steps. Recent state-of-the-art methods fail to produce such plans because they rely on explicit exploration for discovering unseen objects due to partial observability and a heuristic planner to sequence the actions for rearrangement. This paper proposes a novel hierarchical task planner to efficiently plan a sequence of actions to discover unseen objects and rearrange misplaced objects within an untidy house to achieve a desired tidy state. The proposed method introduces several novel techniques, including (i) a method for discovering unseen objects using commonsense knowledge from large language models, (ii) a collision resolution and buffer prediction method based on Cross-Entropy Method to handle blocked goal and swap cases, (iii) a directed spatial graph-based state space for scalability, and (iv) deep reinforcement learning (RL) for producing an efficient planner. The planner interleaves the discovery of unseen objects and rearrangement to minimize the number of steps taken and overall traversal of the agent. The paper also presents new metrics and a benchmark dataset called MoPOR to evaluate the effectiveness of the rearrangement planning in a multi-room setting. The experimental results demonstrate that the proposed method effectively addresses the multi-room rearrangement problem. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: Accepted in AAAI 2024 as oral paper

arXiv:2405.18746 [pdf, other]

STIQ: Safeguarding Training and Inferencing of Quantum Neural Networks from Untrusted Cloud

Authors: Satwik Kundu, Swaroop Ghosh

Abstract: The high expenses imposed by current quantum cloud providers, coupled with the escalating need for quantum resources, may incentivize the emergence of cheaper cloud-based quantum services from potentially untrusted providers. Deploying or hosting quantum models, such as Quantum Neural Networks (QNNs), on these untrusted platforms introduces a myriad of security concerns, with the most critical one… ▽ More The high expenses imposed by current quantum cloud providers, coupled with the escalating need for quantum resources, may incentivize the emergence of cheaper cloud-based quantum services from potentially untrusted providers. Deploying or hosting quantum models, such as Quantum Neural Networks (QNNs), on these untrusted platforms introduces a myriad of security concerns, with the most critical one being model theft. This vulnerability stems from the cloud provider's full access to these circuits during training and/or inference. In this work, we introduce STIQ, a novel ensemble-based strategy designed to safeguard QNNs against such cloud-based adversaries. Our method innovatively trains two distinct QNNs concurrently, hosting them on same or different platforms, in a manner that each network yields obfuscated outputs rendering the individual QNNs ineffective for adversaries operating within cloud environments. However, when these outputs are combined locally (using an aggregate function), they reveal the correct result. Through extensive experiments across various QNNs and datasets, our technique has proven to effectively masks the accuracy and losses of the individually hosted models by upto 76\%, albeit at the expense of $\leq 2\times$ increase in the total computational overhead. This trade-off, however, is a small price to pay for the enhanced security and integrity of QNNs in a cloud-based environment prone to untrusted adversaries. We also demonstrated STIQ's practical application by evaluating it on real 127-qubit IBM\_Sherbrooke hardware, showing that STIQ achieves up to 60\% obfuscation, with combined performance comparable to an unobfuscated model. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17258 [pdf, other]

$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

Authors: Runqian Wang, Soumya Ghosh, David Cox, Diego Antognini, Aude Oliva, Rogerio Feris, Leonid Karlinsky

Abstract: Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modul… ▽ More Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modules need to be re-trained. Such re-training requires access to the data used to train the LoRA for the original base model. This is especially problematic for commercial cloud applications where the LoRA modules and the base models are hosted by service providers who may not be allowed to host proprietary client task data. To address this challenge, we propose $\textit{Trans-LoRA}$ -- a novel method for lossless, nearly data-free transfer of LoRAs across base models. Our approach relies on synthetic data to transfer LoRA modules. Using large language models, we design a synthetic data generator to approximate the data-generating process of the $\textit{observed}$ task data subset. Training on the resulting synthetic dataset transfers LoRA modules to new models. We show the effectiveness of our approach using both LLama and Gemma model families. Our approach achieves lossless (mostly improved) LoRA transfer between models within and across different base model families, and even between different PEFT methods, on a wide variety of tasks. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.15683 [pdf, other]

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

Authors: Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha

Abstract: Recent interest in Large Vision-Language Models (LVLMs) for practical applications is moderated by the significant challenge of hallucination or the inconsistency between the factual information and the generated text. In this paper, we first perform an in-depth analysis of hallucinations and discover several novel insights about how and when LVLMs hallucinate. From our analysis, we show that: (1)… ▽ More Recent interest in Large Vision-Language Models (LVLMs) for practical applications is moderated by the significant challenge of hallucination or the inconsistency between the factual information and the generated text. In this paper, we first perform an in-depth analysis of hallucinations and discover several novel insights about how and when LVLMs hallucinate. From our analysis, we show that: (1) The community's efforts have been primarily targeted towards reducing hallucinations related to visual recognition (VR) prompts (e.g., prompts that only require describing the image), thereby ignoring hallucinations for cognitive prompts (e.g., prompts that require additional skills like reasoning on contents of the image). (2) LVLMs lack visual perception, i.e., they can see but not necessarily understand or perceive the input image. We analyze responses to cognitive prompts and show that LVLMs hallucinate due to a perception gap: although LVLMs accurately recognize visual elements in the input image and possess sufficient cognitive skills, they struggle to respond accurately and hallucinate. To overcome this shortcoming, we propose Visual Description Grounded Decoding (VDGD), a simple, robust, and training-free method for alleviating hallucinations. Specifically, we first describe the image and add it as a prefix to the instruction. Next, during auto-regressive decoding, we sample from the plausible candidates according to their KL-Divergence (KLD) to the description, where lower KLD is given higher preference. Experimental results on several benchmarks and LVLMs show that VDGD improves significantly over other baselines in reducing hallucinations. We also propose VaLLu, a benchmark for the comprehensive evaluation of the cognitive capabilities of LVLMs. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Preprint. Under review. Code will be released on paper acceptance

arXiv:2405.14776 [pdf, other]

Kinetics of orbital ordering in cooperative Jahn-Teller models: Machine-learning enabled large-scale simulations

Authors: Supriyo Ghosh, Sheng Zhang, Chen Cheng, Gia-Wei Chern

Abstract: We present a scalable machine learning (ML) force-field model for the adiabatic dynamics of cooperative Jahn-Teller (JT) systems. Large scale dynamical simulations of the JT model also shed light on the orbital ordering dynamics in colossal magnetoresistance manganites. The JT effect in these materials describes the distortion of local oxygen octahedra driven by a coupling to the orbital degrees o… ▽ More We present a scalable machine learning (ML) force-field model for the adiabatic dynamics of cooperative Jahn-Teller (JT) systems. Large scale dynamical simulations of the JT model also shed light on the orbital ordering dynamics in colossal magnetoresistance manganites. The JT effect in these materials describes the distortion of local oxygen octahedra driven by a coupling to the orbital degrees of freedom of $e_g$ electrons. An effective electron-mediated interaction between the local JT modes leads to a structural transition and the emergence of long-range orbital order at low temperatures. Assuming the principle of locality, a deep-learning neural-network model is developed to accurately and efficiently predict the electron-induced forces that drive the dynamical evolution of JT phonons. A group-theoretical method is utilized to develop a descriptor that incorporates the combined orbital and lattice symmetry into the ML model. Large-scale Langevin dynamics simulations, enabled by the ML force-field models, are performed to investigate the coarsening dynamics of the composite JT distortion and orbital order after a thermal quench. The late-stage coarsening of orbital domains exhibits pronounced freezing behaviors which are likely related to the unusual morphology of the domain structures. Our work highlights a promising avenue for multi-scale dynamical modeling of correlated electron systems. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 17 pages, 11 figures

arXiv:2405.14707 [pdf]

Artificial Intelligence (AI) in Legal Data Mining

Authors: Aniket Deroy, Naksatra Kumar Bailung, Kripabandhu Ghosh, Saptarshi Ghosh, Abhijnan Chakraborty

Abstract: Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same. It is important to organise the legal information in a way that is useful for practitioners and downstream automation tasks. The word ontology was used by Greek philosophers to discuss concepts of existence, being, becoming and realit… ▽ More Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same. It is important to organise the legal information in a way that is useful for practitioners and downstream automation tasks. The word ontology was used by Greek philosophers to discuss concepts of existence, being, becoming and reality. Today, scientists use this term to describe the relation between concepts, data, and entities. A great example for a working ontology was developed by Dhani and Bhatt. This ontology deals with Indian court cases on intellectual property rights (IPR) The future of legal ontologies is likely to be handled by computer experts and legal experts alike. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Book name-Technology and Analytics for Law and Justice, Page no-273-297, Chapter no-14

arXiv:2405.13140 [pdf, ps, other]

On Convergence of the Alternating Directions SGHMC Algorithm

Authors: Soumyadip Ghosh, Yingdong Lu, Tomasz Nowicki

Abstract: We study convergence rates of Hamiltonian Monte Carlo (HMC) algorithms with leapfrog integration under mild conditions on stochastic gradient oracle for the target distribution (SGHMC). Our method extends standard HMC by allowing the use of general auxiliary distributions, which is achieved by a novel procedure of Alternating Directions. The convergence analysis is based on the investigations of… ▽ More We study convergence rates of Hamiltonian Monte Carlo (HMC) algorithms with leapfrog integration under mild conditions on stochastic gradient oracle for the target distribution (SGHMC). Our method extends standard HMC by allowing the use of general auxiliary distributions, which is achieved by a novel procedure of Alternating Directions. The convergence analysis is based on the investigations of the Dirichlet forms associated with the underlying Markov chain driving the algorithms. For this purpose, we provide a detailed analysis on the error of the leapfrog integrator for Hamiltonian motions with both the kinetic and potential energy functions in general form. We characterize the explicit dependence of the convergence rates on key parameters such as the problem dimension, functional properties of both the target and auxiliary distributions, and the quality of the oracle. △ Less

Submitted 26 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12255 [pdf, other]

Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

Authors: Shantanu Ghosh, Clare B. Poynton, Shyam Visweswaran, Kayhan Batmanghelich

Abstract: The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP,… ▽ More The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP, the first VLM pre-trained on a substantial amount of screening mammogram-report pairs, addressing the challenges of dataset diversity and size. Our experiments on two public datasets demonstrate strong performance in classifying and localizing various mammographic attributes crucial for breast cancer detection, showcasing data efficiency and robustness similar to CLIP in CV. We also propose Mammo-FActOR, a novel feature attribution method, to provide spatial interpretation of representation with sentence-level granularity within mammography reports. Code is available publicly: \url{https://github.com/batmanlab/Mammo-CLIP}. △ Less

Submitted 22 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: MICCAI 2024, early accept, top 11%

arXiv:2405.09570 [pdf, other]

FunnelNet: An End-to-End Deep Learning Framework to Monitor Digital Heart Murmur in Real-Time

Authors: Md Jobayer, Md. Mehedi Hasan Shawon, Md Rakibul Hasan, Shreya Ghosh, Tom Gedeon, Md Zakir Hossain

Abstract: Objective: Heart murmurs are abnormal sounds caused by turbulent blood flow within the heart. Several diagnostic methods are available to detect heart murmurs and their severity, such as cardiac auscultation, echocardiography, phonocardiogram (PCG), etc. However, these methods have limitations, including extensive training and experience among healthcare providers, cost and accessibility of echoca… ▽ More Objective: Heart murmurs are abnormal sounds caused by turbulent blood flow within the heart. Several diagnostic methods are available to detect heart murmurs and their severity, such as cardiac auscultation, echocardiography, phonocardiogram (PCG), etc. However, these methods have limitations, including extensive training and experience among healthcare providers, cost and accessibility of echocardiography, as well as noise interference and PCG data processing. This study aims to develop a novel end-to-end real-time heart murmur detection approach using traditional and depthwise separable convolutional networks. Methods: Continuous wavelet transform (CWT) was applied to extract meaningful features from the PCG data. The proposed network has three parts: the Squeeze net, the Bottleneck, and the Expansion net. The Squeeze net generates a compressed data representation, whereas the Bottleneck layer reduces computational complexity using a depthwise-separable convolutional network. The Expansion net is responsible for up-sampling the compressed data to a higher dimension, capturing tiny details of the representative data. Results: For evaluation, we used four publicly available datasets and achieved state-of-the-art performance in all datasets. Furthermore, we tested our proposed network on two resource-constrained devices: a Raspberry PI and an Android device, stripping it down into a tiny machine learning model (TinyML), achieving a maximum of 99.70%. Conclusion: The proposed model offers a deep learning framework for real-time accurate heart murmur detection within limited resources. Significance: It will significantly result in more accessible and practical medical services and reduced diagnosis time to assist medical professionals. The code is publicly available at TBA. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 8-page main paper and 4-page supplementary material

arXiv:2405.06671 [pdf, other]

Parameter-Efficient Instruction Tuning of Large Language Models For Extreme Financial Numeral Labelling

Authors: Subhendu Khatuya, Rajdeep Mukherjee, Akash Ghosh, Manjunath Hegde, Koustuv Dasgupta, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

Abstract: We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags. Different from prior works, we investigate the feasibility of solving this extreme classification problem using a generative paradigm through instruction tuning of Large Language Models (LLMs). To this end, we leverage metric metadata informatio… ▽ More We study the problem of automatically annotating relevant numerals (GAAP metrics) occurring in the financial documents with their corresponding XBRL tags. Different from prior works, we investigate the feasibility of solving this extreme classification problem using a generative paradigm through instruction tuning of Large Language Models (LLMs). To this end, we leverage metric metadata information to frame our target outputs while proposing a parameter efficient solution for the task using LoRA. We perform experiments on two recently released financial numeric labeling datasets. Our proposed model, FLAN-FinXC, achieves new state-of-the-art performances on both the datasets, outperforming several strong baselines. We explain the better scores of our proposed model by demonstrating its capability for zero-shot as well as the least frequently occurring tags. Also, even when we fail to predict the XBRL tags correctly, our generated output has substantial overlap with the ground-truth in majority of the cases. △ Less

Submitted 15 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

Comments: This work has been accepted to appear at North American Chapter of the Association for Computational Linguistics (NAACL), 2024

arXiv:2405.06669 [pdf, other]

Instruction-Guided Bullet Point Summarization of Long Financial Earnings Call Transcripts

Authors: Subhendu Khatuya, Koushiki Sinha, Niloy Ganguly, Saptarshi Ghosh, Pawan Goyal

Abstract: While automatic summarization techniques have made significant advancements, their primary focus has been on summarizing short news articles or documents that have clear structural patterns like scientific articles or government reports. There has not been much exploration into developing efficient methods for summarizing financial documents, which often contain complex facts and figures. Here, we… ▽ More While automatic summarization techniques have made significant advancements, their primary focus has been on summarizing short news articles or documents that have clear structural patterns like scientific articles or government reports. There has not been much exploration into developing efficient methods for summarizing financial documents, which often contain complex facts and figures. Here, we study the problem of bullet point summarization of long Earning Call Transcripts (ECTs) using the recently released ECTSum dataset. We leverage an unsupervised question-based extractive module followed by a parameter efficient instruction-tuned abstractive module to solve this task. Our proposed model FLAN-FinBPS achieves new state-of-the-art performances outperforming the strongest baseline with 14.88% average ROUGE score gain, and is capable of generating factually consistent bullet point summaries that capture the important facts discussed in the ECTs. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Accepted in SIGIR 2024

arXiv:2405.06355 [pdf, other]

Switched Vector Field-based Guidance for General Reference Path Following in Planar Environment

Authors: Subham Basak, Satadal Ghosh

Abstract: Reference path following is a key component in the functioning of almost all engineered autonomous agents. Among several path following guidance methods in existing literature, vector-field-based guidance approach has got wide attention because of its simplicity and guarantee of stability under a broad class of scenarios. However, the usage of same cross-track-error-dependent structure of desired… ▽ More Reference path following is a key component in the functioning of almost all engineered autonomous agents. Among several path following guidance methods in existing literature, vector-field-based guidance approach has got wide attention because of its simplicity and guarantee of stability under a broad class of scenarios. However, the usage of same cross-track-error-dependent structure of desired vector field in most of the existing literature irrespective of instantaneous cross-track error and course angle of unmanned vehicle makes it quite restrictive in attaining faster convergence and also leads to infeasibly high turn rate command for many scenarios. To this end, this paper presents a novel switched vector field-based guidance for following a general reference path, in which the structure of the desired vector field depends on instantaneous cross-track-error and vehicle's course angle. While the developed method ensures faster convergence, it also ensures that the guidance command always stays within a realistic threshold satisfying its curvature constraint, thus making it more real-life implementable for autonomous vehicles with kino-dynamic constraints. Theoretical analysis for convergence of the developed guidance scheme is presented. Possibilities of undesirable chattering at phase transitions are also eliminated. Numerical simulation studies are presented to validate the satisfactory performance of the developed algorithm. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06119 [pdf, other]

Gradient Flow Based Phase-Field Modeling Using Separable Neural Networks

Authors: Revanth Mattey, Susanta Ghosh

Abstract: The $L^2$ gradient flow of the Ginzburg-Landau free energy functional leads to the Allen Cahn equation that is widely used for modeling phase separation. Machine learning methods for solving the Allen-Cahn equation in its strong form suffer from inaccuracies in collocation techniques, errors in computing higher-order spatial derivatives through automatic differentiation, and the large system size… ▽ More The $L^2$ gradient flow of the Ginzburg-Landau free energy functional leads to the Allen Cahn equation that is widely used for modeling phase separation. Machine learning methods for solving the Allen-Cahn equation in its strong form suffer from inaccuracies in collocation techniques, errors in computing higher-order spatial derivatives through automatic differentiation, and the large system size required by the space-time approach. To overcome these limitations, we propose a separable neural network-based approximation of the phase field in a minimizing movement scheme to solve the aforementioned gradient flow problem. At each time step, the separable neural network is used to approximate the phase field in space through a low-rank tensor decomposition thereby accelerating the derivative calculations. The minimizing movement scheme naturally allows for the use of Gauss quadrature technique to compute the functional. A `$tanh$' transformation is applied on the neural network-predicted phase field to strictly bounds the solutions within the values of the two phases. For this transformation, a theoretical guarantee for energy stability of the minimizing movement scheme is established. Our results suggest that bounding the solution through this transformation is the key to effectively model sharp interfaces through separable neural network. The proposed method outperforms the state-of-the-art machine learning methods for phase separation problems and is an order of magnitude faster than the finite element method. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05511 [pdf, other]

Investigating impact of bit-flip errors in control electronics on quantum computation

Authors: Subrata Das, Avimita Chatterjee, Swaroop Ghosh

Abstract: In this paper, we investigate the impact of bit flip errors in FPGA memories in control electronics on quantum computing systems. FPGA memories are integral in storing the amplitude and phase information pulse envelopes, which are essential for generating quantum gate pulses. However, these memories can incur faults due to physical and environmental stressors such as electromagnetic interference,… ▽ More In this paper, we investigate the impact of bit flip errors in FPGA memories in control electronics on quantum computing systems. FPGA memories are integral in storing the amplitude and phase information pulse envelopes, which are essential for generating quantum gate pulses. However, these memories can incur faults due to physical and environmental stressors such as electromagnetic interference, power fluctuations, and temperature variations and adversarial fault injections, potentially leading to errors in quantum gate operations. To understand how these faults affect quantum computations, we conducted a series of experiments to introduce bit flips into the amplitude (both real and imaginary components) and phase values of quantum pulses using IBM's simulated quantum environments, FakeValencia, FakeManila, and FakeLima. Our findings reveal that bit flips in the exponent and initial mantissa bits of the real amplitude cause substantial deviations in quantum gate operations, with TVD increases as high as ~200%. Interestingly, the remaining bits exhibited natural tolerance to errors. We proposed a 3-bit repetition error correction code, which effectively reduced the TVD increases to below 40% without incurring any memory overhead. Due to reuse of less significant bits for error correction, the proposed approach introduces maximum of 5-7% extra TVD in nominal cases. However, this can be avoided by sacrificing memory area for implementing the repetition code. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 9 pages, 9 figures, conference

arXiv:2405.03725 [pdf, other]

Deep Oscillatory Neural Network

Authors: Nurani Rajagopal Rohan, Vigneswaran C, Sayan Ghosh, Kishore Rajendran, Gaurav A, V Srinivasa Chakravarthy

Abstract: We propose a novel, brain-inspired deep neural network model known as the Deep Oscillatory Neural Network (DONN). Deep neural networks like the Recurrent Neural Networks indeed possess sequence processing capabilities but the internal states of the network are not designed to exhibit brain-like oscillatory activity. With this motivation, the DONN is designed to have oscillatory internal dynamics.… ▽ More We propose a novel, brain-inspired deep neural network model known as the Deep Oscillatory Neural Network (DONN). Deep neural networks like the Recurrent Neural Networks indeed possess sequence processing capabilities but the internal states of the network are not designed to exhibit brain-like oscillatory activity. With this motivation, the DONN is designed to have oscillatory internal dynamics. Neurons of the DONN are either nonlinear neural oscillators or traditional neurons with sigmoidal or ReLU activation. The neural oscillator used in the model is the Hopf oscillator, with the dynamics described in the complex domain. Input can be presented to the neural oscillator in three possible modes. The sigmoid and ReLU neurons also use complex-valued extensions. All the weight stages are also complex-valued. Training follows the general principle of weight change by minimizing the output error and therefore has an overall resemblance to complex backpropagation. A generalization of DONN to convolutional networks known as the Oscillatory Convolutional Neural Network is also proposed. The two proposed oscillatory networks are applied to a variety of benchmark problems in signal and image/video processing. The performance of the proposed models is either comparable or superior to published results on the same data sets. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.01737 [pdf, other]

Sample-efficient neural likelihood-free Bayesian inference of implicit HMMs

Authors: Sanmitra Ghosh, Paul J. Birrell, Daniela De Angelis

Abstract: Likelihood-free inference methods based on neural conditional density estimation were shown to drastically reduce the simulation burden in comparison to classical methods such as ABC. When applied in the context of any latent variable model, such as a Hidden Markov model (HMM), these methods are designed to only estimate the parameters, rather than the joint distribution of the parameters and the… ▽ More Likelihood-free inference methods based on neural conditional density estimation were shown to drastically reduce the simulation burden in comparison to classical methods such as ABC. When applied in the context of any latent variable model, such as a Hidden Markov model (HMM), these methods are designed to only estimate the parameters, rather than the joint distribution of the parameters and the hidden states. Naive application of these methods to a HMM, ignoring the inference of this joint posterior distribution, will thus produce an inaccurate estimate of the posterior predictive distribution, in turn hampering the assessment of goodness-of-fit. To rectify this problem, we propose a novel, sample-efficient likelihood-free method for estimating the high-dimensional hidden states of an implicit HMM. Our approach relies on learning directly the intractable posterior distribution of the hidden states, using an autoregressive-flow, by exploiting the Markov property. Upon evaluating our approach on some implicit HMMs, we found that the quality of the estimates retrieved using our method is comparable to what can be achieved using a much more computationally expensive SMC algorithm. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.18935 [pdf, other]

doi 10.1109/WACV57701.2024.00679

What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection

Authors: Sourabh Vasant Gothe, Vibhav Agarwal, Sourav Ghosh, Jayesh Rajkumar Vachhani, Pranay Kashyap, Barath Raj Kandur Raja

Abstract: Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural… ▽ More Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural methods? Does motion information alone suffice for high performance? This inquiry drives us to algorithmically harness motion cues for identifying generic event boundaries in videos. In this work, we propose FlowGEBD, a non-parametric, unsupervised technique for GEBD. Our approach entails two algorithms utilizing optical flow: (i) Pixel Tracking and (ii) Flow Normalization. By conducting thorough experimentation on the challenging Kinetics-GEBD and TAPOS datasets, our results establish FlowGEBD as the new state-of-the-art (SOTA) among unsupervised methods. FlowGEBD exceeds the neural models on the Kinetics-GEBD dataset by obtaining an F1@0.05 score of 0.713 with an absolute gain of 31.7% compared to the unsupervised baseline and achieves an average F1 score of 0.623 on the TAPOS validation dataset. △ Less

Submitted 15 February, 2024; originally announced April 2024.

Comments: Accepted in WACV-2024. Supplementary at https://openaccess.thecvf.com/content/WACV2024/supplemental/Gothe_Whats_in_the_WACV_2024_supplemental.pdf

Journal ref: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2024, pp. 6926-6935

arXiv:2404.16156 [pdf, other]

Guardians of the Quantum GAN

Authors: Archisman Ghosh, Debarshi Kundu, Avimita Chatterjee, Swaroop Ghosh

Abstract: Quantum Generative Adversarial Networks (qGANs) are at the forefront of image-generating quantum machine learning models. To accommodate the growing demand for Noisy Intermediate-Scale Quantum (NISQ) devices to train and infer quantum machine learning models, the number of third-party vendors offering quantum hardware as a service is expected to rise. This expansion introduces the risk of untruste… ▽ More Quantum Generative Adversarial Networks (qGANs) are at the forefront of image-generating quantum machine learning models. To accommodate the growing demand for Noisy Intermediate-Scale Quantum (NISQ) devices to train and infer quantum machine learning models, the number of third-party vendors offering quantum hardware as a service is expected to rise. This expansion introduces the risk of untrusted vendors potentially stealing proprietary information from the quantum machine learning models. To address this concern we propose a novel watermarking technique that exploits the noise signature embedded during the training phase of qGANs as a non-invasive watermark. The watermark is identifiable in the images generated by the qGAN allowing us to trace the specific quantum hardware used during training hence providing strong proof of ownership. To further enhance the security robustness, we propose the training of qGANs on a sequence of multiple quantum hardware, embedding a complex watermark comprising the noise signatures of all the training hardware that is difficult for adversaries to replicate. We also develop a machine learning classifier to extract this watermark robustly, thereby identifying the training hardware (or the suite of hardware) from the images generated by the qGAN validating the authenticity of the model. We note that the watermark signature is robust against inferencing on hardware different than the hardware that was used for training. We obtain watermark extraction accuracy of 100% and ~90% for training the qGAN on individual and multiple quantum hardware setups (and inferencing on different hardware), respectively. Since parameter evolution during training is strongly modulated by quantum noise, the proposed watermark can be extended to other quantum machine learning models as well. △ Less

Submitted 15 May, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 11 pages, 10 figures

arXiv:2404.15610 [pdf, other]

Revealing Aspects of Hawai'i Tourism Using Situated Augmented Reality

Authors: Karen Abe, Jules Park, Samir Ghosh

Abstract: In this position paper, we present a process artifact that aims to bring awareness to historical context, contemporary issues, and identity harm inflicted by tourism in Hawaii. First, we introduce the historical background and how the work is informed by the positionality of the authors. We discuss how related augmented reality work can inform strategy for building augmented reality experiences th… ▽ More In this position paper, we present a process artifact that aims to bring awareness to historical context, contemporary issues, and identity harm inflicted by tourism in Hawaii. First, we introduce the historical background and how the work is informed by the positionality of the authors. We discuss how related augmented reality work can inform strategy for building augmented reality experiences that address cultural issues. Then, we present a mockup of the artifact, aimed to bring awareness to 20th century colonialism, recent Kanaka Maoli art exclusion, and cultural prostitution. We describe how we will share the app at the workshop and list topics for discussion. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: Presented at CHI 2024 (arXiv:2404.05889)

Report number: ARSJ/2024/08

arXiv:2404.10290 [pdf, other]

NeuroMorphix: A Novel Brain MRI Asymmetry-specific Feature Construction Approach For Seizure Recurrence Prediction

Authors: Soumen Ghosh, Viktor Vegh, Shahrzad Moinian, Hamed Moradi, Alice-Ann Sullivan, John Phamnguyen, David Reutens

Abstract: Seizure recurrence is an important concern after an initial unprovoked seizure; without drug treatment, it occurs within 2 years in 40-50% of cases. The decision to treat currently relies on predictors of seizure recurrence risk that are inaccurate, resulting in unnecessary, possibly harmful, treatment in some patients and potentially preventable seizures in others. Because of the link between bra… ▽ More Seizure recurrence is an important concern after an initial unprovoked seizure; without drug treatment, it occurs within 2 years in 40-50% of cases. The decision to treat currently relies on predictors of seizure recurrence risk that are inaccurate, resulting in unnecessary, possibly harmful, treatment in some patients and potentially preventable seizures in others. Because of the link between brain lesions and seizure recurrence, we developed a recurrence prediction tool using machine learning and clinical 3T brain MRI. We developed NeuroMorphix, a feature construction approach based on MRI brain anatomy. Each of seven NeuroMorphix features measures the absolute or relative difference between corresponding regions in each cerebral hemisphere. FreeSurfer was used to segment brain regions and to generate values for morphometric parameters (8 for each cortical region and 5 for each subcortical region). The parameters were then mapped to whole brain NeuroMorphix features, yielding a total of 91 features per subject. Features were generated for a first seizure patient cohort (n = 169) categorised into seizure recurrence and non-recurrence subgroups. State-of-the-art classification algorithms were trained and tested using NeuroMorphix features to predict seizure recurrence. Classification models using the top 5 features, ranked by sequential forward selection, demonstrated excellent performance in predicting seizure recurrence, with area under the ROC curve of 88-93%, accuracy of 83-89%, and F1 score of 83-90%. Highly ranked features aligned with structural alterations known to be associated with epilepsy. This study highlights the potential for targeted, data-driven approaches to aid clinical decision-making in brain disorders. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: This work has been submitted to the IEEE TMI for possible publication

arXiv:2404.05993 [pdf, other]

AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts

Authors: Shaona Ghosh, Prasoon Varshney, Erick Galinkin, Christopher Parisien

Abstract: As Large Language Models (LLMs) and generative AI become more widespread, the content safety risks associated with their use also increase. We find a notable deficiency in high-quality content safety datasets and benchmarks that comprehensively cover a wide range of critical safety areas. To address this, we define a broad content safety risk taxonomy, comprising 13 critical risk and 9 sparse risk… ▽ More As Large Language Models (LLMs) and generative AI become more widespread, the content safety risks associated with their use also increase. We find a notable deficiency in high-quality content safety datasets and benchmarks that comprehensively cover a wide range of critical safety areas. To address this, we define a broad content safety risk taxonomy, comprising 13 critical risk and 9 sparse risk categories. Additionally, we curate AEGISSAFETYDATASET, a new dataset of approximately 26, 000 human-LLM interaction instances, complete with human annotations adhering to the taxonomy. We plan to release this dataset to the community to further research and to help benchmark LLM models for safety. To demonstrate the effectiveness of the dataset, we instruction-tune multiple LLM-based safety models. We show that our models (named AEGISSAFETYEXPERTS), not only surpass or perform competitively with the state-of-the-art LLM-based safety models and general purpose LLMs, but also exhibit robustness across multiple jail-break attack categories. We also show how using AEGISSAFETYDATASET during the LLM alignment phase does not negatively impact the performance of the aligned models on MT Bench scores. Furthermore, we propose AEGIS, a novel application of a no-regret online adaptation framework with strong theoretical guarantees, to perform content moderation with an ensemble of LLM content safety experts in deployment △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.03820 [pdf, other]

CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Authors: Makesh Narsimhan Sreedhar, Traian Rebedea, Shaona Ghosh, Jiaqi Zeng, Christopher Parisien

Abstract: Recent advancements in instruction-tuning datasets have predominantly focused on specific tasks like mathematical or logical reasoning. There has been a notable gap in data designed for aligning language models to maintain topic relevance in conversations - a critical aspect for deploying chatbots to production. We introduce the CantTalkAboutThis dataset to help language models remain focused on t… ▽ More Recent advancements in instruction-tuning datasets have predominantly focused on specific tasks like mathematical or logical reasoning. There has been a notable gap in data designed for aligning language models to maintain topic relevance in conversations - a critical aspect for deploying chatbots to production. We introduce the CantTalkAboutThis dataset to help language models remain focused on the subject at hand during task-oriented interactions. It consists of synthetic dialogues on a wide range of conversation topics from different domains. These dialogues are interspersed with distractor turns that intentionally divert the chatbot from the predefined topic. Fine-tuning language models on this dataset helps make them resilient to deviating from the role assigned and improves their ability to maintain topical coherence compared to general-purpose instruction-tuned LLMs like GPT-4-turbo and Mixtral-Instruct. Additionally, preliminary observations suggest that training models on this dataset also enhance their performance on fine-grained instruction following tasks, including safety alignment. △ Less

Submitted 21 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.03662 [pdf, other]

X-lifecycle Learning for Cloud Incident Management using LLMs

Authors: Drishti Goel, Fiza Husain, Aditya Singh, Supriyo Ghosh, Anjaly Parayil, Chetan Bansal, Xuchao Zhang, Saravan Rajmohan

Abstract: Incident management for large cloud services is a complex and tedious process and requires significant amount of manual efforts from on-call engineers (OCEs). OCEs typically leverage data from different stages of the software development lifecycle [SDLC] (e.g., codes, configuration, monitor data, service properties, service dependencies, trouble-shooting documents, etc.) to generate insights for d… ▽ More Incident management for large cloud services is a complex and tedious process and requires significant amount of manual efforts from on-call engineers (OCEs). OCEs typically leverage data from different stages of the software development lifecycle [SDLC] (e.g., codes, configuration, monitor data, service properties, service dependencies, trouble-shooting documents, etc.) to generate insights for detection, root causing and mitigating of incidents. Recent advancements in large language models [LLMs] (e.g., ChatGPT, GPT-4, Gemini) created opportunities to automatically generate contextual recommendations to the OCEs assisting them to quickly identify and mitigate critical issues. However, existing research typically takes a silo-ed view for solving a certain task in incident management by leveraging data from a single stage of SDLC. In this paper, we demonstrate that augmenting additional contextual data from different stages of SDLC improves the performance of two critically important and practically challenging tasks: (1) automatically generating root cause recommendations for dependency failure related incidents, and (2) identifying ontology of service monitors used for automatically detecting incidents. By leveraging 353 incident and 260 monitor dataset from Microsoft, we demonstrate that augmenting contextual information from different stages of the SDLC improves the performance over State-of-The-Art methods. △ Less

Submitted 15 February, 2024; originally announced April 2024.

arXiv:2404.01669 [pdf, other]

How COVID-19 has Impacted the Anti-Vaccine Discourse: A Large-Scale Twitter Study Spanning Pre-COVID and Post-COVID Era

Authors: Soham Poddar, Rajdeep Mukherjee, Subhendu Khatuya, Niloy Ganguly, Saptarshi Ghosh

Abstract: The debate around vaccines has been going on for decades, but the COVID-19 pandemic showed how crucial it is to understand and mitigate anti-vaccine sentiments. While the pandemic may be over, it is still important to understand how the pandemic affected the anti-vaccine discourse, and whether the arguments against non-COVID vaccines (e.g., Flu, MMR, IPV, HPV vaccines) have also changed due to the… ▽ More The debate around vaccines has been going on for decades, but the COVID-19 pandemic showed how crucial it is to understand and mitigate anti-vaccine sentiments. While the pandemic may be over, it is still important to understand how the pandemic affected the anti-vaccine discourse, and whether the arguments against non-COVID vaccines (e.g., Flu, MMR, IPV, HPV vaccines) have also changed due to the pandemic. This study attempts to answer these questions through a large-scale study of anti-vaccine posts on Twitter. Almost all prior works that utilized social media to understand anti-vaccine opinions considered only the three broad stances of Anti-Vax, Pro-Vax, and Neutral. There has not been any effort to identify the specific reasons/concerns behind the anti-vax sentiments (e.g., side-effects, conspiracy theories, political reasons) on social media at scale. In this work, we propose two novel methods for classifying tweets into 11 different anti-vax concerns -- a discriminative approach (entailment-based) and a generative approach (based on instruction tuning of LLMs) -- which outperform several strong baselines. We then apply this classifier on anti-vaccine tweets posted over a 5-year period (Jan 2018 - Jan 2023) to understand how the COVID-19 pandemic has impacted the anti-vaccine concerns among the masses. We find that the pandemic has made the anti-vaccine discourse far more complex than in the pre-COVID times, and increased the variety of concerns being voiced. Alarmingly, we find that concerns about COVID vaccines are now being projected onto the non-COVID vaccines, thus making more people hesitant in taking vaccines in the post-COVID era. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: This work has been accepted to appear at the 18th International AAAI Conference on Web and Social Media (ICWSM), 2024

arXiv:2404.00419 [pdf, other]

Do Vision-Language Models Understand Compound Nouns?

Authors: Sonal Kumar, Sreyan Ghosh, S Sakshi, Utkarsh Tyagi, Dinesh Manocha

Abstract: Open-vocabulary vision-language models (VLMs) like CLIP, trained using contrastive loss, have emerged as a promising new paradigm for text-to-image retrieval. However, do VLMs understand compound nouns (CNs) (e.g., lab coat) as well as they understand nouns (e.g., lab)? We curate Compun, a novel benchmark with 400 unique and commonly used CNs, to evaluate the effectiveness of VLMs in interpreting… ▽ More Open-vocabulary vision-language models (VLMs) like CLIP, trained using contrastive loss, have emerged as a promising new paradigm for text-to-image retrieval. However, do VLMs understand compound nouns (CNs) (e.g., lab coat) as well as they understand nouns (e.g., lab)? We curate Compun, a novel benchmark with 400 unique and commonly used CNs, to evaluate the effectiveness of VLMs in interpreting CNs. The Compun benchmark challenges a VLM for text-to-image retrieval where, given a text prompt with a CN, the task is to select the correct image that shows the CN among a pair of distractor images that show the constituent nouns that make up the CN. Next, we perform an in-depth analysis to highlight CLIPs' limited understanding of certain types of CNs. Finally, we present an alternative framework that moves beyond hand-written templates for text prompts widely used by CLIP-like models. We employ a Large Language Model to generate multiple diverse captions that include the CN as an object in the scene described by the caption. Our proposed method improves CN understanding of CLIP by 8.25% on Compun. Code and benchmark are available at: https://github.com/sonalkum/Compun △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024 Main Conference

arXiv:2404.00415 [pdf, other]

CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP

Authors: Chandra Kiran Reddy Evuru, Sreyan Ghosh, Sonal Kumar, Ramaneswaran S, Utkarsh Tyagi, Dinesh Manocha

Abstract: We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf instruction-following Large Language Models (LLMs) for generating text that satisfies a set of constraints. Precisely, we extract a set of simple constraints from every instanc… ▽ More We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf instruction-following Large Language Models (LLMs) for generating text that satisfies a set of constraints. Precisely, we extract a set of simple constraints from every instance in the low-resource dataset and verbalize them to prompt an LLM to generate novel and diverse training instances. Our findings reveal that synthetic data that follows simple constraints in the downstream dataset act as highly effective augmentations, and CoDa can achieve this without intricate decoding-time constrained generation techniques or fine-tuning with complex algorithms that eventually make the model biased toward the small number of training instances. Additionally, CoDa is the first framework that provides users explicit control over the augmentation generation process, thereby also allowing easy adaptation to several domains. We demonstrate the effectiveness of CoDa across 11 datasets spanning 3 tasks and 3 low-resource settings. CoDa outperforms all our baselines, qualitatively and quantitatively, with improvements of 0.12%-7.19%. Code is available here: https://github.com/Sreyan88/CoDa △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024 Findings

arXiv:2404.00122 [pdf, other]

AgileFormer: Spatially Agile Transformer UNet for Medical Image Segmentation

Authors: Peijie Qiu, Jin Yang, Sayantan Kumar, Soumyendu Sekhar Ghosh, Aristeidis Sotiras

Abstract: In the past decades, deep neural networks, particularly convolutional neural networks, have achieved state-of-the-art performance in a variety of medical image segmentation tasks. Recently, the introduction of the vision transformer (ViT) has significantly altered the landscape of deep segmentation models. There has been a growing focus on ViTs, driven by their excellent performance and scalabilit… ▽ More In the past decades, deep neural networks, particularly convolutional neural networks, have achieved state-of-the-art performance in a variety of medical image segmentation tasks. Recently, the introduction of the vision transformer (ViT) has significantly altered the landscape of deep segmentation models. There has been a growing focus on ViTs, driven by their excellent performance and scalability. However, we argue that the current design of the vision transformer-based UNet (ViT-UNet) segmentation models may not effectively handle the heterogeneous appearance (e.g., varying shapes and sizes) of objects of interest in medical image segmentation tasks. To tackle this challenge, we present a structured approach to introduce spatially dynamic components to the ViT-UNet. This adaptation enables the model to effectively capture features of target objects with diverse appearances. This is achieved by three main components: \textbf{(i)} deformable patch embedding; \textbf{(ii)} spatially dynamic multi-head attention; \textbf{(iii)} deformable positional encoding. These components were integrated into a novel architecture, termed AgileFormer. AgileFormer is a spatially agile ViT-UNet designed for medical image segmentation. Experiments in three segmentation tasks using publicly available datasets demonstrated the effectiveness of the proposed method. The code is available at \href{https://github.com/sotiraslab/AgileFormer}{https://github.com/sotiraslab/AgileFormer}. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.20317 [pdf, other]

Convolutional Prompting meets Language Models for Continual Learning

Authors: Anurag Roy, Riddhiman Moulick, Vinay K. Verma, Saptarshi Ghosh, Abir Das

Abstract: Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading t… ▽ More Continual Learning (CL) enables machine learning models to learn from continuously shifting new training data in absence of data from old tasks. Recently, pretrained vision transformers combined with prompt tuning have shown promise for overcoming catastrophic forgetting in CL. These approaches rely on a pool of learnable prompts which can be inefficient in sharing knowledge across tasks leading to inferior performance. In addition, the lack of fine-grained layer specific prompts does not allow these to fully express the strength of the prompts for CL. We address these limitations by proposing ConvPrompt, a novel convolutional prompt creation mechanism that maintains layer-wise shared embeddings, enabling both layer-specific learning and better concept transfer across tasks. The intelligent use of convolution enables us to maintain a low parameter overhead without compromising performance. We further leverage Large Language Models to generate fine-grained text descriptions of each category which are used to get task similarity and dynamically decide the number of prompts to be learned. Extensive experiments demonstrate the superiority of ConvPrompt and improves SOTA by ~3% with significantly less parameter overhead. We also perform strong ablation over various modules to disentangle the importance of different components. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: CVPR 2024 Camera Ready

Showing 1–50 of 704 results for author: Ghosh, S