subscribe to arXiv mailings

A new approach to delegate signing rights to proxy signers using isogeny-based cryptography

Authors: Kunal Dey, Somnath Kumar, Vikas Srivastava, Sumit Kumar Debnath

Abstract: E-governance is a two-way protocol through which one can use government services, share data and request information. It refers to the use of communication and information technologies to provide government services to public in an efficient and fast manner. In addition, any document submitted to the e-Government system must be authenticated by a government officer using a digital signature scheme… ▽ More E-governance is a two-way protocol through which one can use government services, share data and request information. It refers to the use of communication and information technologies to provide government services to public in an efficient and fast manner. In addition, any document submitted to the e-Government system must be authenticated by a government officer using a digital signature scheme. In the context of digital signatures, the proxy signature is an important cryptographic primitive that allows the original signer to delegate signing authority to another signer (proxy signer). The proxy signature has a number of important applications in the e-government system. There are now a large amount of proxy signature schemes. The security of most of them relies on the following hard problems: the discrete logarithm problem and the factorization of integers problem. However, a large-scale quantum computer can solve them in polynomial time due to Shor's algorithm. As a consequence, there is a need for a quantum computer-resistant proxy signature to secure e-governance system from quantum adversaries. In this work, we propose the first post-quantum isogeny based proxy signature scheme CSI-PS (commutative supersingular isogeny proxy signature). Our construction is proven to be uf-cma secure under the hardness of the group action inverse problem (GAIP) based on isogeny. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.03486 [pdf, other]

Celeb-FBI: A Benchmark Dataset on Human Full Body Images and Age, Gender, Height and Weight Estimation using Deep Learning Approach

Authors: Pronay Debnath, Usafa Akther Rifa, Busra Kamal Rafa, Ali Haider Talukder Akib, Md. Aminur Rahman

Abstract: The scarcity of comprehensive datasets in surveillance, identification, image retrieval systems, and healthcare poses a significant challenge for researchers in exploring new methodologies and advancing knowledge in these respective fields. Furthermore, the need for full-body image datasets with detailed attributes like height, weight, age, and gender is particularly significant in areas such as f… ▽ More The scarcity of comprehensive datasets in surveillance, identification, image retrieval systems, and healthcare poses a significant challenge for researchers in exploring new methodologies and advancing knowledge in these respective fields. Furthermore, the need for full-body image datasets with detailed attributes like height, weight, age, and gender is particularly significant in areas such as fashion industry analytics, ergonomic design assessment, virtual reality avatar creation, and sports performance analysis. To address this gap, we have created the 'Celeb-FBI' dataset which contains 7,211 full-body images of individuals accompanied by detailed information on their height, age, weight, and gender. Following the dataset creation, we proceed with the preprocessing stages, including image cleaning, scaling, and the application of Synthetic Minority Oversampling Technique (SMOTE). Subsequently, utilizing this prepared dataset, we employed three deep learning approaches: Convolutional Neural Network (CNN), 50-layer ResNet, and 16-layer VGG, which are used for estimating height, weight, age, and gender from human full-body images. From the results obtained, ResNet-50 performed best for the system with an accuracy rate of 79.18% for age, 95.43% for gender, 85.60% for height and 81.91% for weight. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted for publication in 3rd International Conference on Advanced Communication and Intelligent Systems

arXiv:2406.15335 [pdf, other]

Keystroke Dynamics Against Academic Dishonesty in the Age of LLMs

Authors: Debnath Kundu, Atharva Mehta, Rajesh Kumar, Naman Lal, Avinash Anand, Apoorv Singh, Rajiv Ratn Shah

Abstract: The transition to online examinations and assignments raises significant concerns about academic integrity. Traditional plagiarism detection systems often struggle to identify instances of intelligent cheating, particularly when students utilize advanced generative AI tools to craft their responses. This study proposes a keystroke dynamics-based method to differentiate between bona fide and assist… ▽ More The transition to online examinations and assignments raises significant concerns about academic integrity. Traditional plagiarism detection systems often struggle to identify instances of intelligent cheating, particularly when students utilize advanced generative AI tools to craft their responses. This study proposes a keystroke dynamics-based method to differentiate between bona fide and assisted writing within academic contexts. To facilitate this, a dataset was developed to capture the keystroke patterns of individuals engaged in writing tasks, both with and without the assistance of generative AI. The detector, trained using a modified TypeNet architecture, achieved accuracies ranging from 74.98% to 85.72% in condition-specific scenarios and from 52.24% to 80.54% in condition-agnostic scenarios. The findings highlight significant differences in keystroke dynamics between genuine and assisted writing. The outcomes of this study enhance our understanding of how users interact with generative AI and have implications for improving the reliability of digital educational platforms. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted for publication at The IEEE International Joint Conference on Biometrics (IJCB2024), contains 9 pages, 3 figures, 3 tables

ACM Class: I.5.4

arXiv:2405.19948 [pdf, other]

Scalable Test Generation to Trigger Rare Targets in High-Level Synthesizable IPs for Cloud FPGAs

Authors: Mukta Debnath, Animesh Basak Chowdhury, Debasri Saha, Susmita Sur-Kolay

Abstract: High-Level Synthesis (HLS) has transformed the development of complex Hardware IPs (HWIP) by offering abstraction and configurability through languages like SystemC/C++, particularly for Field Programmable Gate Array (FPGA) accelerators in high-performance and cloud computing contexts. These IPs can be synthesized for different FPGA boards in cloud, offering compact area requirements and enhanced… ▽ More High-Level Synthesis (HLS) has transformed the development of complex Hardware IPs (HWIP) by offering abstraction and configurability through languages like SystemC/C++, particularly for Field Programmable Gate Array (FPGA) accelerators in high-performance and cloud computing contexts. These IPs can be synthesized for different FPGA boards in cloud, offering compact area requirements and enhanced flexibility. HLS enables designs to execute directly on ARM processors within modern FPGAs without the need for Register Transfer Level (RTL) synthesis, thereby conserving FPGA resources. While HLS offers flexibility and efficiency, it also introduces potential vulnerabilities such as the presence of hidden circuitry, including the possibility of hosting hardware trojans within designs. In cloud environments, these vulnerabilities pose significant security concerns such as leakage of sensitive data, IP functionality disruption and hardware damage, necessitating the development of robust testing frameworks. This research presents an advanced testing approach for HLS-developed cloud IPs, specifically targeting hidden malicious functionalities that may exist in rare conditions within the design. The proposed method leverages selective instrumentation, combining greybox fuzzing and concolic execution techniques to enhance test generation capabilities. Evaluation conducted on various HLS benchmarks, possessing characteristics of FPGA-based cloud IPs with embedded cloud related threats, demonstrates the effectiveness of our framework in detecting trojans and rare scenarios, showcasing improvements in coverage, time efficiency, memory usage, and testing costs compared to existing methods. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.12550 [pdf, other]

Blockchain-based AI Methods for Managing Industrial IoT: Recent Developments, Integration Challenges and Opportunities

Authors: Anichur Rahman, Dipanjali Kundu, Tanoy Debnath, Muaz Rahman, Airin Afroj Aishi, Jahidul Islam

Abstract: Currently, Blockchain (BC), Artificial Intelligence (AI), and smart Industrial Internet of Things (IIoT) are not only leading promising technologies in the world, but also these technologies facilitate the current society to develop the standard of living and make it easier for users. However, these technologies have been applied in various domains for different purposes. Then, these are successfu… ▽ More Currently, Blockchain (BC), Artificial Intelligence (AI), and smart Industrial Internet of Things (IIoT) are not only leading promising technologies in the world, but also these technologies facilitate the current society to develop the standard of living and make it easier for users. However, these technologies have been applied in various domains for different purposes. Then, these are successfully assisted in developing the desired system, such as-smart cities, homes, manufacturers, education, and industries. Moreover, these technologies need to consider various issues-security, privacy, confidentiality, scalability, and application challenges in diverse fields. In this context, with the increasing demand for these issues solutions, the authors present a comprehensive survey on the AI approaches with BC in the smart IIoT. Firstly, we focus on state-of-the-art overviews regarding AI, BC, and smart IoT applications. Then, we provide the benefits of integrating these technologies and discuss the established methods, tools, and strategies efficiently. Most importantly, we highlight the various issues--security, stability, scalability, and confidentiality and guide the way of addressing strategy and methods. Furthermore, the individual and collaborative benefits of applications have been discussed. Lastly, we are extensively concerned about the open research challenges and potential future guidelines based on BC-based AI approaches in the intelligent IIoT system. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.10543 [pdf, other]

SMARD: A Cost Effective Smart Agro Development Technology for Crops Disease Classification

Authors: Tanoy Debnath, Shadman Wadith, Anichur Rahman

Abstract: Agriculture has a significant role in a country's economy. The "SMARD" project aims to strengthen the country's agricultural sector by giving farmers with the information and tools they need to solve common difficulties and increase productivity. The project provides farmers with information on crop care, seed selection, and disease management best practices, as well as access to tools for recogni… ▽ More Agriculture has a significant role in a country's economy. The "SMARD" project aims to strengthen the country's agricultural sector by giving farmers with the information and tools they need to solve common difficulties and increase productivity. The project provides farmers with information on crop care, seed selection, and disease management best practices, as well as access to tools for recognizing and treating crop diseases. Farmers can also contact the expert panel through text message, voice call, or video call to purchase fertilizer, seeds, and pesticides at low prices, as well as secure bank loans. The project's goal is to empower farmers and rural communities by providing them with the resources they need to increase crop yields. Additionally, the "SMARD" will not only help farmers and rural communities live better lives, but it will also have a good effect on the economy of the nation. Farmers are now able to recognize plant illnesses more quickly because of the application of machine learning techniques based on image processing categorization. Our experiments' results show that our system "SMARD" outperforms the cutting-edge web applications by attaining 97.3% classification accuracy and 96% F1-score in crop disease classification. Overall, our project is an important endeavor for the nation's agricultural sector because its main goal is to give farmers the information, resources, and tools they need to increase crop yields, improve economic outcomes, and improve livelihoods. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.07338 [pdf, other]

Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images

Authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Pronay Debnath, Asif Iftekher Fahim, Faisal Muhammad Shah

Abstract: Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our… ▽ More Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our research in fundus image analysis advances deep learning-based classification using eight pre-trained CNN models. To enhance interpretability, we utilize Explainable AI techniques such as Grad-CAM, Grad-CAM++, Score-CAM, Faster Score-CAM, and Layer CAM. These techniques illuminate the decision-making processes of the models, fostering transparency and trust in their predictions. Expanding our exploration, we investigate ten models, including TransUNet with ResNet backbones, Attention U-Net with DenseNet and ResNet backbones, and Swin-UNET. Incorporating diverse architectures such as ResNet50V2, ResNet101V2, ResNet152V2, and DenseNet121 among others, this comprehensive study deepens our insights into attention mechanisms for enhanced fundus image analysis. Among the evaluated models for fundus image classification, ResNet101 emerged with the highest accuracy, achieving an impressive 94.17%. On the other end of the spectrum, EfficientNetB0 exhibited the lowest accuracy among the models, achieving a score of 88.33%. Furthermore, in the domain of fundus image segmentation, Swin-Unet demonstrated a Mean Pixel Accuracy of 86.19%, showcasing its effectiveness in accurately delineating regions of interest within fundus images. Conversely, Attention U-Net with DenseNet201 backbone exhibited the lowest Mean Pixel Accuracy among the evaluated models, achieving a score of 75.87%. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.07010 [pdf, other]

Deciphering public attention to geoengineering and climate issues using machine learning and dynamic analysis

Authors: Ramit Debnath, Pengyu Zhang, Tianzhu Qin, R. Michael Alvarez, Shaun D. Fitzgerald

Abstract: As the conversation around using geoengineering to combat climate change intensifies, it is imperative to engage the public and deeply understand their perspectives on geoengineering research, development, and potential deployment. Through a comprehensive data-driven investigation, this paper explores the types of news that captivate public interest in geoengineering. We delved into 30,773 English… ▽ More As the conversation around using geoengineering to combat climate change intensifies, it is imperative to engage the public and deeply understand their perspectives on geoengineering research, development, and potential deployment. Through a comprehensive data-driven investigation, this paper explores the types of news that captivate public interest in geoengineering. We delved into 30,773 English-language news articles from the BBC and the New York Times, combined with Google Trends data spanning 2018 to 2022, to explore how public interest in geoengineering fluctuates in response to news coverage of broader climate issues. Using BERT-based topic modeling, sentiment analysis, and time-series regression models, we found that positive sentiment in energy-related news serves as a good predictor of heightened public interest in geoengineering, a trend that persists over time. Our findings suggest that public engagement with geoengineering and climate action is not uniform, with some topics being more potent in shaping interest over time, such as climate news related to energy, disasters, and politics. Understanding these patterns is crucial for scientists, policymakers, and educators aiming to craft effective strategies for engaging with the public and fostering dialogue around emerging climate technologies. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: 46 page, 6 main figures and SI

ACM Class: J.4; K.4

arXiv:2405.04716 [pdf, other]

Physics-based deep learning reveals rising heating demand heightens air pollution in Norwegian cities

Authors: Cong Cao, Ramit Debnath, R. Michael Alvarez

Abstract: Policymakers frequently analyze air quality and climate change in isolation, disregarding their interactions. This study explores the influence of specific climate factors on air quality by contrasting a regression model with K-Means Clustering, Hierarchical Clustering, and Random Forest techniques. We employ Physics-based Deep Learning (PBDL) and Long Short-Term Memory (LSTM) to examine the air p… ▽ More Policymakers frequently analyze air quality and climate change in isolation, disregarding their interactions. This study explores the influence of specific climate factors on air quality by contrasting a regression model with K-Means Clustering, Hierarchical Clustering, and Random Forest techniques. We employ Physics-based Deep Learning (PBDL) and Long Short-Term Memory (LSTM) to examine the air pollution predictions. Our analysis utilizes ten years (2009-2018) of daily traffic, weather, and air pollution data from three major cities in Norway. Findings from feature selection reveal a correlation between rising heating degree days and heightened air pollution levels, suggesting increased heating activities in Norway are a contributing factor to worsening air quality. PBDL demonstrates superior accuracy in air pollution predictions compared to LSTM. This paper contributes to the growing literature on PBDL methods for more accurate air pollution predictions using environmental variables, aiding policymakers in formulating effective data-driven climate policies. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 52 pages, 23 figures

ACM Class: K.4.1; J.2; I.2

arXiv:2405.02937 [pdf, other]

Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study

Authors: Fatema Tuj Johora Faria, Mukaffi Bin Moin, Asif Iftekher Fahim, Pronay Debnath, Faisal Muhammad Shah

Abstract: Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two s… ▽ More Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two statements, known as the premise and hypothesis. When the premise logically implies the hypothesis, the pair is labeled "entailment". If the hypothesis contradicts the premise, the pair receives the "contradiction" label. When there is insufficient evidence to establish a connection, the pair is described as "neutral". Despite the success of Large Language Models (LLMs) in various tasks, their effectiveness in NLI remains constrained by issues like low-resource domain accuracy, model overconfidence, and difficulty in capturing human judgment disagreements. This study addresses the underexplored area of evaluating LLMs in low-resourced languages such as Bengali. Through a comprehensive evaluation, we assess the performance of prominent LLMs and state-of-the-art (SOTA) models in Bengali NLP tasks, focusing on natural language inference. Utilizing the XNLI dataset, we conduct zero-shot and few-shot evaluations, comparing LLMs like GPT-3.5 Turbo and Gemini 1.5 Pro with models such as BanglaBERT, Bangla BERT Base, DistilBERT, mBERT, and sahajBERT. Our findings reveal that while LLMs can achieve comparable or superior performance to fine-tuned SOTA models in few-shot scenarios, further research is necessary to enhance our understanding of LLMs in languages with modest resources like Bengali. This study underscores the importance of continued efforts in exploring LLM capabilities across diverse linguistic contexts. △ Less

Submitted 7 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)

arXiv:2404.12330 [pdf, other]

A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Authors: Christoph Reich, Oliver Hahn, Daniel Cremers, Stefan Roth, Biplob Debnath

Abstract: Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and requir… ▽ More Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and required to ensure interoperability. This paper aims to examine the implications of employing standardized codecs within deep vision pipelines. We find that using JPEG and H.264 coding significantly deteriorates the accuracy across a broad range of vision tasks and models. For instance, strong compression rates reduce semantic segmentation accuracy by more than 80% in mIoU. In contrast to previous findings, our analysis extends beyond image and action classification to localization and dense prediction tasks, thus providing a more comprehensive perspective. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

arXiv:2404.12309 [pdf, other]

iRAG: An Incremental Retrieval Augmented Generation System for Videos

Authors: Md Adnan Arefeen, Biplob Debnath, Md Yusuf Sarwar Uddin, Srimat Chakradhar

Abstract: Retrieval augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for combined understanding of multimodal data such as text, images and videos is appealing but two critical limitations exist: one-time, upfront capture of all content in large multimodal data as text descriptions entail… ▽ More Retrieval augmented generation (RAG) systems combine the strengths of language generation and information retrieval to power many real-world applications like chatbots. Use of RAG for combined understanding of multimodal data such as text, images and videos is appealing but two critical limitations exist: one-time, upfront capture of all content in large multimodal data as text descriptions entails high processing times, and not all information in the rich multimodal data is typically in the text descriptions. Since the user queries are not known apriori, developing a system for multimodal to text conversion and interactive querying of multimodal data is challenging. To address these limitations, we propose iRAG, which augments RAG with a novel incremental workflow to enable interactive querying of large corpus of multimodal data. Unlike traditional RAG, iRAG quickly indexes large repositories of multimodal data, and in the incremental workflow, it uses the index to opportunistically extract more details from select portions of the multimodal data to retrieve context relevant to an interactive user query. Such an incremental workflow avoids long multimodal to text conversion times, overcomes information loss issues by doing on-demand query-specific extraction of details in multimodal data, and ensures high quality of responses to interactive user queries that are often not known apriori. To the best of our knowledge, iRAG is the first system to augment RAG with an incremental workflow to support efficient interactive querying of large, real-world multimodal data. Experimental results on real-world long videos demonstrate 23x to 25x faster video to text ingestion, while ensuring that quality of responses to interactive user queries is comparable to responses from a traditional RAG where all video data is converted to text upfront before any querying. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.05159 [pdf]

Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods

Authors: Roopkatha Dey, Aivy Debnath, Sayak Kumar Dutta, Kaustav Ghosh, Arijit Mitra, Arghya Roy Chowdhury, Jaydip Sen

Abstract: In various real-world applications such as machine translation, sentiment analysis, and question answering, a pivotal role is played by NLP models, facilitating efficient communication and decision-making processes in domains ranging from healthcare to finance. However, a significant challenge is posed to the robustness of these natural language processing models by text adversarial attacks. These… ▽ More In various real-world applications such as machine translation, sentiment analysis, and question answering, a pivotal role is played by NLP models, facilitating efficient communication and decision-making processes in domains ranging from healthcare to finance. However, a significant challenge is posed to the robustness of these natural language processing models by text adversarial attacks. These attacks involve the deliberate manipulation of input text to mislead the predictions of the model while maintaining human interpretability. Despite the remarkable performance achieved by state-of-the-art models like BERT in various natural language processing tasks, they are found to remain vulnerable to adversarial perturbations in the input text. In addressing the vulnerability of text classifiers to adversarial attacks, three distinct attack mechanisms are explored in this paper using the victim model BERT: BERT-on-BERT attack, PWWS attack, and Fraud Bargain's Attack (FBA). Leveraging the IMDB, AG News, and SST2 datasets, a thorough comparative analysis is conducted to assess the effectiveness of these attacks on the BERT classifier model. It is revealed by the analysis that PWWS emerges as the most potent adversary, consistently outperforming other methods across multiple evaluation scenarios, thereby emphasizing its efficacy in generating adversarial examples for text classification. Through comprehensive experimentation, the performance of these attacks is assessed and the findings indicate that the PWWS attack outperforms others, demonstrating lower runtime, higher accuracy, and favorable semantic similarity scores. The key insight of this paper lies in the assessment of the relative performances of three prevalent state-of-the-art attack mechanisms. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: This report pertains to the Capstone Project done by Group 2 of the Fall batch of 2023 students at Praxis Tech School, Kolkata, India. The reports consists of 28 pages and it includes 10 tables. This is the preprint which will be submitted to IEEE CONIT 2024 for review

arXiv:2404.03566 [pdf, other]

PointInfinity: Resolution-Invariant Point Diffusion Models

Authors: Zixuan Huang, Justin Johnson, Shoubhik Debnath, James M. Rehg, Chao-Yuan Wu

Abstract: We present PointInfinity, an efficient family of point cloud diffusion models. Our core idea is to use a transformer-based architecture with a fixed-size, resolution-invariant latent representation. This enables efficient training with low-resolution point clouds, while allowing high-resolution point clouds to be generated during inference. More importantly, we show that scaling the test-time reso… ▽ More We present PointInfinity, an efficient family of point cloud diffusion models. Our core idea is to use a transformer-based architecture with a fixed-size, resolution-invariant latent representation. This enables efficient training with low-resolution point clouds, while allowing high-resolution point clouds to be generated during inference. More importantly, we show that scaling the test-time resolution beyond the training resolution improves the fidelity of generated point clouds and surfaces. We analyze this phenomenon and draw a link to classifier-free guidance commonly used in diffusion models, demonstrating that both allow trading off fidelity and variability during inference. Experiments on CO3D show that PointInfinity can efficiently generate high-resolution point clouds (up to 131k points, 31 times more than Point-E) with state-of-the-art quality. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024, project website at https://zixuanh.com/projects/pointinfinity

arXiv:2403.18247 [pdf, other]

An Experimentally Validated Feasible Quantum Protocol for Identity-Based Signature with Application to Secure Email Communication

Authors: Tapaswini Mohanty, Vikas Srivastava, Sumit Kumar Debnath, Debasish Roy, Kouichi Sakurai, Sourav Mukhopadhyay

Abstract: Digital signatures are one of the simplest cryptographic building blocks that provide appealing security characteristics such as authenticity, unforgeability, and undeniability. In 1984, Shamir developed the first Identity-based signature (IBS) to simplify public key infrastructure and circumvent the need for certificates. It makes the process uncomplicated by enabling users to verify digital sign… ▽ More Digital signatures are one of the simplest cryptographic building blocks that provide appealing security characteristics such as authenticity, unforgeability, and undeniability. In 1984, Shamir developed the first Identity-based signature (IBS) to simplify public key infrastructure and circumvent the need for certificates. It makes the process uncomplicated by enabling users to verify digital signatures using only the identifiers of signers, such as email, phone number, etc. Nearly all existing IBS protocols rely on several theoretical assumption-based hard problems. Unfortunately, these hard problems are unsafe and pose a hazard in the quantum realm. Thus, designing IBS algorithms that can withstand quantum attacks and ensure long-term security is an important direction for future research. Quantum cryptography (QC) is one such approach. In this paper, we propose an IBS based on QC. Our scheme's security is based on the laws of quantum mechanics. It thereby achieves long-term security and provides resistance against quantum attacks. We verify the proposed design's correctness and feasibility by simulating it in a prototype quantum device and the IBM Qiskit quantum simulator. The implementation code in qiskit with Jupyternotebook is provided in the Annexure. Moreover, we discuss the application of our design in secure email communication. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.15336 [pdf, other]

Dialogue Understandability: Why are we streaming movies with subtitles?

Authors: Helard Becerra Martinez, Alessandro Ragano, Diptasree Debnath, Asad Ullah, Crisron Rudolf Lucas, Martin Walsh, Andrew Hines

Abstract: Watching movies and TV shows with subtitles enabled is not simply down to audibility or speech intelligibility. A variety of evolving factors related to technological advances, cinema production and social behaviour challenge our perception and understanding. This study seeks to formalise and give context to these influential factors under a wider and novel term referred to as Dialogue Understanda… ▽ More Watching movies and TV shows with subtitles enabled is not simply down to audibility or speech intelligibility. A variety of evolving factors related to technological advances, cinema production and social behaviour challenge our perception and understanding. This study seeks to formalise and give context to these influential factors under a wider and novel term referred to as Dialogue Understandability. We propose a working definition for Dialogue Understandability being a listener's capacity to follow the story without undue cognitive effort or concentration being required that impacts their Quality of Experience (QoE). The paper identifies, describes and categorises the factors that influence Dialogue Understandability mapping them over the QoE framework, a media streaming lifecycle, and the stakeholders involved. We then explore available measurement tools in the literature and link them to the factors they could potentially be used for. The maturity and suitability of these tools is evaluated over a set of pilot experiments. Finally, we reflect on the gaps that still need to be filled, what we can measure and what not, future subjective experiments, and new research trends that could help us to fully characterise Dialogue Understandability. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.11494 [pdf, other]

CCC++: Optimized Color Classified Colorization with Segment Anything Model (SAM) Empowered Object Selective Color Harmonization

Authors: Mrityunjoy Gain, Avi Deb Raha, Rameswar Debnath

Abstract: In this paper, we formulate the colorization problem into a multinomial classification problem and then apply a weighted function to classes. We propose a set of formulas to transform color values into color classes and vice versa. To optimize the classes, we experiment with different bin sizes for color class transformation. Observing class appearance, standard deviation, and model parameters on… ▽ More In this paper, we formulate the colorization problem into a multinomial classification problem and then apply a weighted function to classes. We propose a set of formulas to transform color values into color classes and vice versa. To optimize the classes, we experiment with different bin sizes for color class transformation. Observing class appearance, standard deviation, and model parameters on various extremely large-scale real-time images in practice we propose 532 color classes for our classification task. During training, we propose a class-weighted function based on true class appearance in each batch to ensure proper saturation of individual objects. We adjust the weights of the major classes, which are more frequently observed, by lowering them, while escalating the weights of the minor classes, which are less commonly observed. In our class re-weight formula, we propose a hyper-parameter for finding the optimal trade-off between the major and minor appeared classes. As we apply regularization to enhance the stability of the minor class, occasional minor noise may appear at the object's edges. We propose a novel object-selective color harmonization method empowered by the Segment Anything Model (SAM) to refine and enhance these edges. We propose two new color image evaluation metrics, the Color Class Activation Ratio (CCAR), and the True Activation Ratio (TAR), to quantify the richness of color components. We compare our proposed model with state-of-the-art models using six different dataset: Place, ADE, Celeba, COCO, Oxford 102 Flower, and ImageNet, in qualitative and quantitative approaches. The experimental results show that our proposed model outstrips other models in visualization, CNR and in our proposed CCAR and TAR measurement criteria while maintaining satisfactory performance in regression (MSE, PSNR), similarity (SSIM, LPIPS, UIUI), and generative criteria (FID). △ Less

Submitted 24 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2403.01476

arXiv:2403.01476 [pdf, other]

CCC: Color Classified Colorization

Authors: Mrityunjoy Gain, Avi Deb Raha, Rameswar Debnath

Abstract: Automatic colorization of gray images with objects of different colors and sizes is challenging due to inter- and intra-object color variation and the small area of the main objects due to extensive backgrounds. The learning process often favors dominant features, resulting in a biased model. In this paper, we formulate the colorization problem into a multinomial classification problem and then ap… ▽ More Automatic colorization of gray images with objects of different colors and sizes is challenging due to inter- and intra-object color variation and the small area of the main objects due to extensive backgrounds. The learning process often favors dominant features, resulting in a biased model. In this paper, we formulate the colorization problem into a multinomial classification problem and then apply a weighted function to classes. We propose a set of formulas to transform color values into color classes and vice versa. Class optimization and balancing feature distribution are the keys for good performance. Observing class appearance on various extremely large-scale real-time images in practice, we propose 215 color classes for our colorization task. During training, we propose a class-weighted function based on true class appearance in each batch to ensure proper color saturation of individual objects. We establish a trade-off between major and minor classes to provide orthodox class prediction by eliminating major classes' dominance over minor classes. As we apply regularization to enhance the stability of the minor class, occasional minor noise may appear at the object's edges. We propose a novel object-selective color harmonization method empowered by the SAM to refine and enhance these edges. We propose a new color image evaluation metric, the Chromatic Number Ratio (CNR), to quantify the richness of color components. We compare our proposed model with state-of-the-art models using five different datasets: ADE, Celeba, COCO, Oxford 102 Flower, and ImageNet, in both qualitative and quantitative approaches. The experimental results show that our proposed model outstrips other models in visualization and CNR measurement criteria while maintaining satisfactory performance in regression (MSE, PSNR), similarity (SSIM, LPIPS, UIQI), and generative criteria (FID). △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2401.13631 [pdf, other]

Quantifying the Impact of Frame Preemption on Combined TSN Shapers

Authors: Rubi Debnath, Philipp Hortig, Luxi Zhao, Sebastian Steinhorst

Abstract: Different scheduling mechanisms in Time Sensitive Networking (TSN) can be integrated together to design and support complex architectures with enhanced capabilities for mixed critical networks. Integrating Frame Preemption (FP) with Credit-Based Shaper (CBS) and Gate Control List (GCL) opens up different modes and configuration choices resulting in a complex evaluation of several possibilities and… ▽ More Different scheduling mechanisms in Time Sensitive Networking (TSN) can be integrated together to design and support complex architectures with enhanced capabilities for mixed critical networks. Integrating Frame Preemption (FP) with Credit-Based Shaper (CBS) and Gate Control List (GCL) opens up different modes and configuration choices resulting in a complex evaluation of several possibilities and their impact on the Quality of Service (QoS). In this paper, we implement and quantify the integration of preemptive CBS with GCL by incorporating FP into the architecture. Our experiments show that the end-to-end delay of Audio Video Bridging (AVB) flows shaped by CBS reduces significantly (up to 40\%) when AVB flows are set to preemptable class. We further show that the jitter of Time Triggered (TT) traffic remains unaffected in "with Hold/Release" mode. Furthermore, we propose to introduce Guardband (GB) in the "without Hold/Release" to reduce the jitter of the TT flow. We compare all the different integration modes, starting with CBS with GCL, extending it further to FP. We evaluate all feasible combinations in both synthetic and realistic scenarios and offer recommendations for practical configuration methods. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted in IEEE/IFIP Network Operations and Management Symposium (NOMS) 2024

arXiv:2401.01356 [pdf, other]

Efficient Indexing of Meta-Data (Extracted from Educational Videos)

Authors: Shalika Kumbham, Abhijit Debnath, Krothapalli Sreenivasa Rao

Abstract: Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, m… ▽ More Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, maintained, and uploaded. These videos generally contain information about that video before the lecture begins. We generally observe that these educational videos have metadata containing five to six attributes: Institute Name, Publisher Name, Department Name, Professor Name, Subject Name, and Topic Name. It would be easy to maintain these videos if we could organize them according to their categories. The indexing of these videos based on this information is beneficial for students all around the world to efficiently utilise these videos. In this project, we are trying to get the metadata information mentioned above from the video lectures. △ Less

Submitted 11 December, 2023; originally announced January 2024.

arXiv:2401.00681 [pdf, ps, other]

Mitigating Procrastination in Spatial Crowdsourcing Via Efficient Scheduling Algorithm

Authors: Naren Debnath, Sajal Mukhopadhyay, Fatos Xhafa

Abstract: Several works related to spatial crowdsourcing have been proposed in the direction where the task executers are to perform the tasks within the stipulated deadlines. Though the deadlines are set, it may be a practical scenario that majority of the task executers submit the tasks as late as possible. This situation where the task executers may delay their task submission is termed as procrastinatio… ▽ More Several works related to spatial crowdsourcing have been proposed in the direction where the task executers are to perform the tasks within the stipulated deadlines. Though the deadlines are set, it may be a practical scenario that majority of the task executers submit the tasks as late as possible. This situation where the task executers may delay their task submission is termed as procrastination in behavioural economics. In many applications, these late submission of tasks may be problematic for task providers. So here, the participating agents (both task providers and task executers) are articulated with the procrastination issue. In literature, how to prevent this procrastination within the deadline is not addressed in spatial crowdsourcing scenario. However, in a bipartite graph setting one procrastination aware scheduling is proposed but balanced job (task and job will synonymously be used) distribution in different slots (also termed as schedules) is not considered there. In this paper, a procrastination aware scheduling of jobs is proliferated by proposing an (randomized) algorithm in spatial crowdsourcing scenario. Our algorithm ensures that balancing of jobs in different schedules are maintained. Our scheme is compared with the existing algorithm through extensive simulation and in terms of balancing effect, our proposed algorithm outperforms the existing one. Analytically it is shown that our proposed algorithm maintains the balanced distribution. △ Less

Submitted 7 February, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.16322 [pdf, other]

Blockchain-Envisioned Post-Quantum Secure Sanitizable Signature for Audit Logs Management

Authors: Vikas Srivastava, Paresh Baidya, Sihem Mesnager, Debasish Roy, Sumit Kumar Debnath

Abstract: Audit logs are one of the most important tools for transparently tracking system events and maintaining continuous oversight in corporate organizations and enterprise business systems. There are many cases where the audit logs contain sensitive data, or the audit logs are enormous. In these situations, dealing with a subset of the data is more practical than the entire data set. To provide a secur… ▽ More Audit logs are one of the most important tools for transparently tracking system events and maintaining continuous oversight in corporate organizations and enterprise business systems. There are many cases where the audit logs contain sensitive data, or the audit logs are enormous. In these situations, dealing with a subset of the data is more practical than the entire data set. To provide a secure solution to handle these issues, a sanitizable signature scheme (SSS) is a viable cryptographic primitive. Herein, we first present the first post-quantum secure multivariate-based SSS, namely Mul-SAN. Our proposed design provides unforgeability, privacy, immutability, signer accountability, and sanitizer accountability under the assumption that the MQ problem is NP-hard. Mul-SAN is very efficient and only requires computing field multiplications and additions over a finite field for its implementation. Mul-SAN presents itself as a practical method to partially delegate control of the authenticated data in avenues like the healthcare industry and government organizations. We also explore using Blockchain to provide a tamper-proof and robust audit log mechanism. △ Less

Submitted 25 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.16318 [pdf, other]

Quantum Secure Protocols for Multiparty Computations

Authors: Tapaswini Mohanty, Vikas Srivastava, Sumit Kumar Debnath, Pantelimon Stanica

Abstract: Secure multiparty computation (MPC) schemes allow two or more parties to conjointly compute a function on their private input sets while revealing nothing but the output. Existing state-of-the-art number-theoretic-based designs face the threat of attacks through quantum algorithms. In this context, we present secure MPC protocols that can withstand quantum attacks. We first present the design and… ▽ More Secure multiparty computation (MPC) schemes allow two or more parties to conjointly compute a function on their private input sets while revealing nothing but the output. Existing state-of-the-art number-theoretic-based designs face the threat of attacks through quantum algorithms. In this context, we present secure MPC protocols that can withstand quantum attacks. We first present the design and analysis of an information-theoretic secure oblivious linear evaluation (OLE), namely ${\sf qOLE}$ in the quantum domain, and show that our ${\sf qOLE}$ is safe from external attacks. In addition, our scheme satisfies all the security requirements of a secure OLE. We further utilize ${\sf qOLE}$ as a building block to construct a quantum-safe multiparty private set intersection (MPSI) protocol. △ Less

Submitted 17 July, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2311.03792 [pdf, other]

Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment

Authors: Jakir Hasan, Shrestha Datta, Ameya Debnath

Abstract: The International Phonetic Alphabet (IPA) is indispensable in language learning and understanding, aiding users in accurate pronunciation and comprehension. Additionally, it plays a pivotal role in speech therapy, linguistic research, accurate transliteration, and the development of text-to-speech systems, making it an essential tool across diverse fields. Bangla being 7th as one of the widely use… ▽ More The International Phonetic Alphabet (IPA) is indispensable in language learning and understanding, aiding users in accurate pronunciation and comprehension. Additionally, it plays a pivotal role in speech therapy, linguistic research, accurate transliteration, and the development of text-to-speech systems, making it an essential tool across diverse fields. Bangla being 7th as one of the widely used languages, gives rise to the need for IPA in its domain. Its IPA mapping is too diverse to be captured manually giving the need for Artificial Intelligence and Machine Learning in this field. In this study, we have utilized a transformer-based sequence-to-sequence model at the letter and symbol level to get the IPA of each Bangla word as the variation of IPA in association of different words is almost null. Our transformer model only consisted of 8.5 million parameters with only a single decoder and encoder layer. Additionally, to handle the punctuation marks and the occurrence of foreign languages in the text, we have utilized manual mapping as the model won't be able to learn to separate them from Bangla words while decreasing our required computational resources. Finally, maintaining the relative position of the sentence component IPAs and generation of the combined IPA has led us to achieve the top position with a word error rate of 0.10582 in the public ranking of DataVerse Challenge - ITVerse 2023 (https://www.kaggle.com/competitions/dataverse_2023/). △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: Achieved top position with a word error rate of 0.10582 in the public ranking of DataVerse Challenge - ITVerse 2023 (link: https://www.kaggle.com/competitions/dataverse_2023/). All codes can be found on the respective competition webpage

arXiv:2309.16282 [pdf, other]

AgEncID: Aggregate Encryption Individual Decryption of Key for FPGA Bitstream IP Cores in Cloud

Authors: Mukta Debnath, Krishnendu Guha, Debasri Saha, Susmita Sur-Kolay

Abstract: Cloud computing platforms are progressively adopting Field Programmable Gate Arrays to deploy specialized hardware accelerators for specific computational tasks. However, the security of FPGA-based bitstream for Intellectual Property, IP cores from unauthorized interception in cloud environments remains a prominent concern. Existing methodologies for protection of such bitstreams possess several l… ▽ More Cloud computing platforms are progressively adopting Field Programmable Gate Arrays to deploy specialized hardware accelerators for specific computational tasks. However, the security of FPGA-based bitstream for Intellectual Property, IP cores from unauthorized interception in cloud environments remains a prominent concern. Existing methodologies for protection of such bitstreams possess several limitations, such as requiring a large number of keys, tying bitstreams to specific FPGAs, and relying on trusted third parties. This paper proposes Aggregate Encryption and Individual Decryption, a cryptosystem based on key aggregation to enhance the security of FPGA-based bitstream for IP cores and to address the pitfalls of previous related works. In our proposed scheme, IP providers can encrypt their bitstreams with a single key for a set S of FPGA boards, with which the bitstreams can directly be decrypted on any of the FPGA boards in S. Aggregate encryption of the key is performed in a way which ensures that the key can solely be obtained onboard through individual decryption employing the board's private key, thus facilitating secure key provisioning. The proposed cryptosystem is evaluated mainly on Zynq FPGAs. The outcomes demonstrate that our cryptosystem not only outperforms existing techniques with respect to resource, time and energy significantly but also upholds robust security assurances. △ Less

Submitted 4 October, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: 21 pages, 7 figures, 5 tables

arXiv:2309.12107 [pdf, ps, other]

A Computational Analysis of Vagueness in Revisions of Instructional Texts

Authors: Alok Debnath, Michael Roth

Abstract: WikiHow is an open-domain repository of instructional articles for a variety of tasks, which can be revised by users. In this paper, we extract pairwise versions of an instruction before and after a revision was made. Starting from a noisy dataset of revision histories, we specifically extract and analyze edits that involve cases of vagueness in instructions. We further investigate the ability of… ▽ More WikiHow is an open-domain repository of instructional articles for a variety of tasks, which can be revised by users. In this paper, we extract pairwise versions of an instruction before and after a revision was made. Starting from a noisy dataset of revision histories, we specifically extract and analyze edits that involve cases of vagueness in instructions. We further investigate the ability of a neural model to distinguish between two versions of an instruction in our data by adopting a pairwise ranking task from previous work and showing improvements over existing baselines. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: EACL 2021 best student paper

arXiv:2309.06978 [pdf, other]

doi 10.1109/WACV57701.2024.00408

Differentiable JPEG: The Devil is in the Details

Authors: Christoph Reich, Biplob Debnath, Deep Patel, Srimat Chakradhar

Abstract: JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by… ▽ More JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by $3.47$dB (PSNR) on average. For strong compression rates, we can even improve PSNR by $9.51$dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG. △ Less

Submitted 22 December, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted at WACV 2024. Project page: https://christophreich1996.github.io/differentiable_jpeg/ WACV paper: https://openaccess.thecvf.com/content/WACV2024/html/Reich_Differentiable_JPEG_The_Devil_Is_in_the_Details_WACV_2024_paper.html

arXiv:2309.00841 [pdf, other]

LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs

Authors: Md Adnan Arefeen, Biplob Debnath, Srimat Chakradhar

Abstract: Question-answering (QA) is a significant application of Large Language Models (LLMs), shaping chatbot capabilities across healthcare, education, and customer service. However, widespread LLM integration presents a challenge for small businesses due to the high expenses of LLM API usage. Costs rise rapidly when domain-specific data (context) is used alongside queries for accurate domain-specific LL… ▽ More Question-answering (QA) is a significant application of Large Language Models (LLMs), shaping chatbot capabilities across healthcare, education, and customer service. However, widespread LLM integration presents a challenge for small businesses due to the high expenses of LLM API usage. Costs rise rapidly when domain-specific data (context) is used alongside queries for accurate domain-specific LLM responses. One option is to summarize the context by using LLMs and reduce the context. However, this can also filter out useful information that is necessary to answer some domain-specific queries. In this paper, we shift from human-oriented summarizers to AI model-friendly summaries. Our approach, LeanContext, efficiently extracts $k$ key sentences from the context that are closely aligned with the query. The choice of $k$ is neither static nor random; we introduce a reinforcement learning technique that dynamically determines $k$ based on the query and context. The rest of the less important sentences are reduced using a free open source text reduction method. We evaluate LeanContext against several recent query-aware and query-unaware context reduction approaches on prominent datasets (arxiv papers and BBC news articles). Despite cost reductions of $37.29\%$ to $67.81\%$, LeanContext's ROUGE-1 score decreases only by $1.41\%$ to $2.65\%$ compared to a baseline that retains the entire context (no summarization). Additionally, if free pretrained LLM-based summarizers are used to reduce context (into human consumable summaries), LeanContext can further modify the reduced context to enhance the accuracy (ROUGE-1 score) by $13.22\%$ to $24.61\%$. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: The paper is under review

arXiv:2308.16215 [pdf, other]

Deep Video Codec Control for Vision Models

Authors: Christoph Reich, Biplob Debnath, Deep Patel, Tim Prangemeier, Daniel Cremers, Srimat Chakradhar

Abstract: Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that stan… ▽ More Standardized lossy video coding is at the core of almost all real-world video processing pipelines. Rate control is used to enable standard codecs to adapt to different network bandwidth conditions or storage constraints. However, standard video codecs (e.g., H.264) and their rate control modules aim to minimize video distortion w.r.t. human quality assessment. We demonstrate empirically that standard-coded videos vastly deteriorate the performance of deep vision models. To overcome the deterioration of vision performance, this paper presents the first end-to-end learnable deep video codec control that considers both bandwidth constraints and downstream deep vision performance, while adhering to existing standardization. We demonstrate that our approach better preserves downstream deep vision performance than traditional standard video coding. △ Less

Submitted 16 April, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

arXiv:2307.16296 [pdf]

doi 10.30574/wjarr.2023.19.1.1335

Integrating Information Technology in Healthcare: Recent Developments, Challenges, and Future Prospects for Urban and Regional Health

Authors: Shipu Debnath

Abstract: The use of technology in healthcare has become increasingly popular in recent years, with the potential to improve how healthcare is delivered, patient outcomes, and cost-effectiveness. This review paper provides an overview of how technology has been used in healthcare, particularly in cities and for personalized medicine. The paper discusses different ways technology is being used in healthcare,… ▽ More The use of technology in healthcare has become increasingly popular in recent years, with the potential to improve how healthcare is delivered, patient outcomes, and cost-effectiveness. This review paper provides an overview of how technology has been used in healthcare, particularly in cities and for personalized medicine. The paper discusses different ways technology is being used in healthcare, such as electronic health records, telemedicine, remote monitoring, medical imaging, wearable devices, and artificial intelligence. It also looks at the challenges and problems that come with using technology in healthcare, such as keeping patient data private and secure, making sure different technology systems can work together, and ensuring patients are comfortable using technology. In addition, the paper explores the potential of technology in healthcare, including improving how easily patients can get care, the quality of care they receive, and the cost of care. It also talks about how technology can help personalize care to individual patients. Finally, the paper summarizes the main points, makes recommendations for healthcare providers and policymakers, and suggests directions for future research. Overall, this review shows how technology can be used to improve healthcare, while also acknowledging the challenges that come with using technology in this way. △ Less

Submitted 5 August, 2023; v1 submitted 30 July, 2023; originally announced July 2023.

arXiv:2307.13581 [pdf, other]

Comparing Forward and Inverse Design Paradigms: A Case Study on Refractory High-Entropy Alloys

Authors: Arindam Debnath, Lavanya Raman, Wenjie Li, Adam M. Krajewski, Marcia Ahn, Shuang Lin, Shunli Shang, Allison M. Beese, Zi-Kui Liu, Wesley F. Reinhart

Abstract: The rapid design of advanced materials is a topic of great scientific interest. The conventional, ``forward'' paradigm of materials design involves evaluating multiple candidates to determine the best candidate that matches the target properties. However, recent advances in the field of deep learning have given rise to the possibility of an ``inverse'' design paradigm for advanced materials, where… ▽ More The rapid design of advanced materials is a topic of great scientific interest. The conventional, ``forward'' paradigm of materials design involves evaluating multiple candidates to determine the best candidate that matches the target properties. However, recent advances in the field of deep learning have given rise to the possibility of an ``inverse'' design paradigm for advanced materials, wherein a model provided with the target properties is able to find the best candidate. Being a relatively new concept, there remains a need to systematically evaluate how these two paradigms perform in practical applications. Therefore, the objective of this study is to directly, quantitatively compare the forward and inverse design modeling paradigms. We do so by considering two case studies of refractory high-entropy alloy design with different objectives and constraints and comparing the inverse design method to other forward schemes like localized forward search, high throughput screening, and multi objective optimization. △ Less

Submitted 25 July, 2023; originally announced July 2023.

arXiv:2306.12634 [pdf, ps, other]

Efficient quantum image representation and compression circuit using zero-discarded state preparation approach

Authors: Md Ershadul Haque, Manoranjan Paul, Anwaar Ulhaq, Tanmoy Debnath

Abstract: Quantum image computing draws a lot of attention due to storing and processing image data faster than classical. With increasing the image size, the number of connections also increases, leading to the circuit complex. Therefore, efficient quantum image representation and compression issues are still challenging. The encoding of images for representation and compression in quantum systems is diffe… ▽ More Quantum image computing draws a lot of attention due to storing and processing image data faster than classical. With increasing the image size, the number of connections also increases, leading to the circuit complex. Therefore, efficient quantum image representation and compression issues are still challenging. The encoding of images for representation and compression in quantum systems is different from classical ones. In quantum, encoding of position is more concerned which is the major difference from the classical. In this paper, a novel zero-discarded state connection novel enhance quantum representation (ZSCNEQR) approach is introduced to reduce complexity further by discarding '0' in the location representation information. In the control operational gate, only input '1' contribute to its output thus, discarding zero makes the proposed ZSCNEQR circuit more efficient. The proposed ZSCNEQR approach significantly reduced the required bit for both representation and compression. The proposed method requires 11.76\% less qubits compared to the recent existing method. The results show that the proposed approach is highly effective for representing and compressing images compared to the two relevant existing methods in terms of rate-distortion performance. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 7 figures

arXiv:2304.09617 [pdf, other]

Towards Autonomous Selective Harvesting: A Review of Robot Perception, Robot Design, Motion Planning and Control

Authors: Vishnu Rajendran S, Bappaditya Debnath, Bappaditya Debnath, Sariah Mghames, Willow Mandil, Soran Parsa, Simon Parsons, Amir Ghalamzan-E

Abstract: This paper provides an overview of the current state-of-the-art in selective harvesting robots (SHRs) and their potential for addressing the challenges of global food production. SHRs have the potential to increase productivity, reduce labour costs, and minimise food waste by selectively harvesting only ripe fruits and vegetables. The paper discusses the main components of SHRs, including percepti… ▽ More This paper provides an overview of the current state-of-the-art in selective harvesting robots (SHRs) and their potential for addressing the challenges of global food production. SHRs have the potential to increase productivity, reduce labour costs, and minimise food waste by selectively harvesting only ripe fruits and vegetables. The paper discusses the main components of SHRs, including perception, grasping, cutting, motion planning, and control. It also highlights the challenges in developing SHR technologies, particularly in the areas of robot design, motion planning and control. The paper also discusses the potential benefits of integrating AI and soft robots and data-driven methods to enhance the performance and robustness of SHR systems. Finally, the paper identifies several open research questions in the field and highlights the need for further research and development efforts to advance SHR technologies to meet the challenges of global food production. Overall, this paper provides a starting point for researchers and practitioners interested in developing SHRs and highlights the need for more research in this field. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: Preprint: to be appeared in Journal of Field Robotics

arXiv:2301.03947 [pdf, other]

Autonomous Strawberry Picking Robotic System (Robofruit)

Authors: Soran Parsa, Bappaditya Debnath, Muhammad Arshad Khan, Amir Ghalamzan E.

Abstract: Challenges in strawberry picking made selective harvesting robotic technology demanding. However, selective harvesting of strawberries is complicated forming a few scientific research questions. Most available solutions only deal with a specific picking scenario, e.g., picking only a single variety of fruit in isolation. Nonetheless, most economically viable (e.g. high-yielding and/or disease-resi… ▽ More Challenges in strawberry picking made selective harvesting robotic technology demanding. However, selective harvesting of strawberries is complicated forming a few scientific research questions. Most available solutions only deal with a specific picking scenario, e.g., picking only a single variety of fruit in isolation. Nonetheless, most economically viable (e.g. high-yielding and/or disease-resistant) varieties of strawberry are grown in dense clusters. The current perception technology in such use cases is inefficient. In this work, we developed a novel system capable of harvesting strawberries with several unique features. The features allow the system to deal with very complex picking scenarios, e.g. dense clusters. Our concept of a modular system makes our system reconfigurable to adapt to different picking scenarios. We designed, manufactured, and tested a picking head with 2.5 DOF (2 independent mechanisms and 1 dependent cutting system) capable of removing possible occlusions and harvesting targeted strawberries without contacting fruit flesh to avoid damage and bruising. In addition, we developed a novel perception system to localise strawberries and detect their key points, picking points, and determine their ripeness. For this purpose, we introduced two new datasets. Finally, we tested the system in a commercial strawberry growing field and our research farm with three different strawberry varieties. The results show the effectiveness and reliability of the proposed system. The designed picking head was able to remove occlusions and harvest strawberries effectively. The perception system was able to detect and determine the ripeness of strawberries with 95% accuracy. In total, the system was able to harvest 87% of all detected strawberries with a success rate of 83% for all pluckable fruits. We also discuss a series of open research questions in the discussion section. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: To appear in the Journal of Field Robotics (Accepted) Please watch the video at https://www.youtube.com/watch?v=v8gGAvsISXU

arXiv:2301.00808 [pdf, other]

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Authors: Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, Saining Xie

Abstract: Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can a… ▽ More Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data. △ Less

Submitted 2 January, 2023; originally announced January 2023.

Comments: Code and models available at https://github.com/facebookresearch/ConvNeXt-V2

arXiv:2212.13169 [pdf, ps, other]

Quantum Codes from additive constacyclic codes over a mixed alphabet and the MacWilliams identities

Authors: Indibar Debnath, Ashutosh Singh, Om Prakash, Abdollah Alhevaz

Abstract: Let $\mathbb{Z}_p$ be the ring of integers modulo a prime number $p$ where $p-1$ is a quadratic residue modulo $p$. This paper presents the study of constacyclic codes over chain rings $\mathcal{R}=\frac{\mathbb{Z}_p[u]}{\langle u^2\rangle}$ and $\mathcal{S}=\frac{\mathbb{Z}_p[u]}{\langle u^3\rangle}$. We also study additive constacyclic codes over $\mathcal{R}\mathcal{S}$ and… ▽ More Let $\mathbb{Z}_p$ be the ring of integers modulo a prime number $p$ where $p-1$ is a quadratic residue modulo $p$. This paper presents the study of constacyclic codes over chain rings $\mathcal{R}=\frac{\mathbb{Z}_p[u]}{\langle u^2\rangle}$ and $\mathcal{S}=\frac{\mathbb{Z}_p[u]}{\langle u^3\rangle}$. We also study additive constacyclic codes over $\mathcal{R}\mathcal{S}$ and $\mathbb{Z}_p\mathcal{R}\mathcal{S}$ using the generator polynomials over the rings $\mathcal{R}$ and $\mathcal{S},$ respectively. Further, by defining Gray maps on $\mathcal{R}$, $\mathcal{S}$ and $\mathbb{Z}_p\mathcal{R}\mathcal{S},$ we obtain some results on the Gray images of additive codes. Then we give the weight enumeration and MacWilliams identities corresponding to the additive codes over $\mathbb{Z}_p\mathcal{R}\mathcal{S}$. Finally, as an application of the obtained codes, we give quantum codes using the CSS construction. △ Less

Submitted 26 December, 2022; originally announced December 2022.

Comments: 22 pages

MSC Class: 94B05; 94B15; 94B35; 94B60

arXiv:2212.08801 [pdf, other]

Comparison of Model-Free and Model-Based Learning-Informed Planning for PointGoal Navigation

Authors: Yimeng Li, Arnab Debnath, Gregory J. Stein, Jana Kosecka

Abstract: In recent years several learning approaches to point goal navigation in previously unseen environments have been proposed. They vary in the representations of the environments, problem decomposition, and experimental evaluation. In this work, we compare the state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the poi… ▽ More In recent years several learning approaches to point goal navigation in previously unseen environments have been proposed. They vary in the representations of the environments, problem decomposition, and experimental evaluation. In this work, we compare the state-of-the-art Deep Reinforcement Learning based approaches with Partially Observable Markov Decision Process (POMDP) formulation of the point goal navigation problem. We adapt the (POMDP) sub-goal framework proposed by [1] and modify the component that estimates frontier properties by using partial semantic maps of indoor scenes built from images' semantic segmentation. In addition to the well-known completeness of the model-based approach, we demonstrate that it is robust and efficient in that it leverages informative, learned properties of the frontiers compared to an optimistic frontier-based planner. We also demonstrate its data efficiency compared to the end-to-end deep reinforcement learning approaches. We compare our results against an optimistic planner, ANS and DD-PPO on Matterport3D dataset using the Habitat Simulator. We show comparable, though slightly worse performance than the SOTA DD-PPO approach, yet with far fewer data. △ Less

Submitted 17 December, 2022; originally announced December 2022.

Comments: arXiv admin note: text overlap with arXiv:2211.07898

arXiv:2212.07079 [pdf, other]

A novel state connection strategy for quantum computing to represent and compress digital images

Authors: Md Ershadul Haque, Manoranjan Paul, Tanmoy Debnath

Abstract: Quantum image processing draws a lot of attention due to faster data computation and storage compared to classical data processing systems. Converting classical image data into the quantum domain and state label preparation complexity is still a challenging issue. The existing techniques normally connect the pixel values and the state position directly. Recently, the EFRQI (efficient flexible repr… ▽ More Quantum image processing draws a lot of attention due to faster data computation and storage compared to classical data processing systems. Converting classical image data into the quantum domain and state label preparation complexity is still a challenging issue. The existing techniques normally connect the pixel values and the state position directly. Recently, the EFRQI (efficient flexible representation of the quantum image) approach uses an auxiliary qubit that connects the pixel-representing qubits to the state position qubits via Toffoli gates to reduce state connection. Due to the twice use of Toffoli gates for each pixel connection still it requires a significant number of bits to connect each pixel value. In this paper, we propose a new SCMFRQI (state connection modification FRQI) approach for further reducing the required bits by modifying the state connection using a reset gate rather than repeating the use of the same Toffoli gate connection as a reset gate. Moreover, unlike other existing methods, we compress images using block-level for further reduction of required qubits. The experimental results confirm that the proposed method outperforms the existing methods in terms of both image representation and compression points of view. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: 8 pages, conference

arXiv:2212.03558 [pdf, other]

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

Authors: Ankur Debnath, Shridevi S Patil, Gangotri Nadiger, Ramakrishnan Angarai Ganesan

Abstract: End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference… ▽ More End-to-end text-to-speech (TTS) systems have been developed for European languages like English and Spanish with state-of-the-art speech quality, prosody, and naturalness. However, development of end-to-end TTS for Indian languages is lagging behind in terms of quality. The challenges involved in such a task are: 1) scarcity of quality training data; 2) low efficiency during training and inference; 3) slow convergence in the case of large vocabulary size. In our work reported in this paper, we have investigated the use of fine-tuning the English-pretrained Tacotron2 model with limited Sanskrit data to synthesize natural sounding speech in Sanskrit in low resource settings. Our experiments show encouraging results, achieving an overall MOS of 3.38 from 37 evaluators with good Sanskrit spoken knowledge. This is really a very good result, considering the fact that the speech data we have used is of duration 2.5 hours only. △ Less

Submitted 7 December, 2022; originally announced December 2022.

arXiv:2211.07898 [pdf, other]

Learning-Augmented Model-Based Planning for Visual Exploration

Authors: Yimeng Li, Arnab Debnath, Gregory Stein, Jana Kosecka

Abstract: We consider the problem of time-limited robotic exploration in previously unseen environments where exploration is limited by a predefined amount of time. We propose a novel exploration approach using learning-augmented model-based planning. We generate a set of subgoals associated with frontiers on the current map and derive a Bellman Equation for exploration with these subgoals. Visual sensing a… ▽ More We consider the problem of time-limited robotic exploration in previously unseen environments where exploration is limited by a predefined amount of time. We propose a novel exploration approach using learning-augmented model-based planning. We generate a set of subgoals associated with frontiers on the current map and derive a Bellman Equation for exploration with these subgoals. Visual sensing and advances in semantic mapping of indoor scenes are exploited for training a deep convolutional neural network to estimate properties associated with each frontier: the expected unobserved area beyond the frontier and the expected timesteps (discretized actions) required to explore it. The proposed model-based planner is guaranteed to explore the whole scene if time permits. We thoroughly evaluate our approach on a large-scale pseudo-realistic indoor dataset (Matterport3D) with the Habitat simulator. We compare our approach with classical and more recent RL-based exploration methods. Our approach surpasses the greedy strategies by 2.1% and the RL-based exploration methods by 8.4% in terms of coverage. △ Less

Submitted 9 August, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: Accepted to IROS 2023

arXiv:2210.15294 [pdf, other]

doi 10.1145/3511808.3557399

Modeling Inter-Dependence Between Time and Mark in Multivariate Temporal Point Processes

Authors: Govind Waghmare, Ankur Debnath, Siddhartha Asthana, Aakarsh Malhotra

Abstract: Temporal Point Processes (TPP) are probabilistic generative frameworks. They model discrete event sequences localized in continuous time. Generally, real-life events reveal descriptive information, known as marks. Marked TPPs model time and marks of the event together for practical relevance. Conditioned on past events, marked TPPs aim to learn the joint distribution of the time and the mark of th… ▽ More Temporal Point Processes (TPP) are probabilistic generative frameworks. They model discrete event sequences localized in continuous time. Generally, real-life events reveal descriptive information, known as marks. Marked TPPs model time and marks of the event together for practical relevance. Conditioned on past events, marked TPPs aim to learn the joint distribution of the time and the mark of the next event. For simplicity, conditionally independent TPP models assume time and marks are independent given event history. They factorize the conditional joint distribution of time and mark into the product of individual conditional distributions. This structural limitation in the design of TPP models hurt the predictive performance on entangled time and mark interactions. In this work, we model the conditional inter-dependence of time and mark to overcome the limitations of conditionally independent models. We construct a multivariate TPP conditioning the time distribution on the current event mark in addition to past events. Besides the conventional intensity-based models for conditional joint distribution, we also draw on flexible intensity-free TPP models from the literature. The proposed TPP models outperform conditionally independent and dependent models in standard prediction tasks. Our experimentation on various datasets with multiple evaluation metrics highlights the merit of the proposed approach. △ Less

Submitted 27 October, 2022; originally announced October 2022.

arXiv:2210.07224 [pdf, other]

Exploring Long-Sequence Masked Autoencoders

Authors: Ronghang Hu, Shoubhik Debnath, Saining Xie, Xinlei Chen

Abstract: Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains. In contrast to discrete tokens in natural languages, the input for image MAE is continuous and subject to additional specifications. We systematically study each input specification during the pre-training stage, and find sequence length is a key axis that further scales MAE. Ou… ▽ More Masked Autoencoding (MAE) has emerged as an effective approach for pre-training representations across multiple domains. In contrast to discrete tokens in natural languages, the input for image MAE is continuous and subject to additional specifications. We systematically study each input specification during the pre-training stage, and find sequence length is a key axis that further scales MAE. Our study leads to a long-sequence version of MAE with minimal changes to the original recipe, by just decoupling the mask size from the patch size. For object detection and semantic segmentation, our long-sequence MAE shows consistent gains across all the experimental setups without extra computation cost during the transfer. While long-sequence pre-training is discerned most beneficial for detection and segmentation, we also achieve strong results on ImageNet-1K classification by keeping a standard image size and only increasing the sequence length. We hope our findings can provide new insights and avenues for scaling in computer vision. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 11 pages

arXiv:2208.14277 [pdf, other]

Advance quantum image representation and compression using DCTEFRQI approach

Authors: Md Ershadul Haque, Manoranjon Paul, Anwaar Ulhaq, Tanmoy Debnath

Abstract: In recent year, quantum image processing got a lot of attention in the field of image processing due to opportunity to place huge image data in quantum Hilbert space. Hilbert space or Euclidean space has infinite dimension to locate and process the image data faster. Moreover, several researches show that, the computational time of quantum process is faster than classical computer. By encoding and… ▽ More In recent year, quantum image processing got a lot of attention in the field of image processing due to opportunity to place huge image data in quantum Hilbert space. Hilbert space or Euclidean space has infinite dimension to locate and process the image data faster. Moreover, several researches show that, the computational time of quantum process is faster than classical computer. By encoding and compressing the image in quantum domain is still challenging issue. From literature survey, we have proposed a DCTEFRQI (Direct Cosine Transform Efficient Flexible Representation of Quantum Image) algorithm to represent and compress gray image efficiently which save computational time and minimize the complexity of preparation. The objective of this work is to represent and compress various gray image size in quantum computer using DCT(Discrete Cosine Transform) and EFRQI (Efficient Flexible Representation of Quantum Image) approach together. Quirk simulation tool is used to design corresponding quantum image circuit. Due to limitation of qubit, total 16 numbers of qubit are used to represent the gray scale image among those 8 are used to map the coefficient values and the rest 8 are used to generate the corresponding coefficient position. Theoretical analysis and experimental result show that, proposed DCTEFRQI scheme provides better representation and compression compare to DCT-GQIR, DWT-GQIR and DWT-EFRQI in terms of PSNR(Peak Signal to Noise Ratio) and bit rate.. △ Less

Submitted 30 August, 2022; originally announced August 2022.

arXiv:2208.09074 [pdf, other]

dPMP-Deep Probabilistic Motion Planning: A use case in Strawberry Picking Robot

Authors: Alessandra Tafuro, Bappaditya Debnath, Andrea M. Zanchettin, Amir Ghalamzan E

Abstract: This paper presents a novel probabilistic approach to deep robot learning from demonstrations (LfD). Deep movement primitives (DMPs) are deterministic LfD model that maps visual information directly into a robot trajectory. This paper extends DMPs and presents a deep probabilistic model that maps the visual information into a distribution of effective robot trajectories. The architecture that lead… ▽ More This paper presents a novel probabilistic approach to deep robot learning from demonstrations (LfD). Deep movement primitives (DMPs) are deterministic LfD model that maps visual information directly into a robot trajectory. This paper extends DMPs and presents a deep probabilistic model that maps the visual information into a distribution of effective robot trajectories. The architecture that leads to the highest level of trajectory accuracy is presented and compared with the existing methods. Moreover, this paper introduces a novel training method for learning domain-specific latent features. We show the superiority of the proposed probabilistic approach and novel latent space learning in the lab's real-robot task of strawberry harvesting. The experimental results demonstrate that latent space learning can significantly improve model prediction performances. The proposed approach allows to sample trajectories from distribution and optimises the robot trajectory to meet a secondary objective, e.g. collision avoidance. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: To appear In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2208.00825 [pdf, other]

How Do Requirements Evolve During Elicitation? An Empirical Study Combining Interviews and App Store Analysis

Authors: Alessio Ferrari, Paola Spoletini, Sourav Debnath

Abstract: Requirements are elicited from the customer and other stakeholders through an iterative process of interviews, prototyping, and other interactive sessions. Then, requirements can be further extended, based on the analysis of the features of competing products available on the market. Understanding how this process takes place can help to identify the contribution of the different elicitation phase… ▽ More Requirements are elicited from the customer and other stakeholders through an iterative process of interviews, prototyping, and other interactive sessions. Then, requirements can be further extended, based on the analysis of the features of competing products available on the market. Understanding how this process takes place can help to identify the contribution of the different elicitation phases, thereby allowing requirements analysts to better distribute their resources. In this work, we empirically study in which way requirements get transformed from initial ideas into documented needs, and then evolve based on the inspiration coming from similar products. To this end, we select 30 subjects that act as requirements analysts, and we perform interview-based elicitation sessions with a fictional customer. After the sessions, the analysts produce a first set of requirements for the system. Then, they are required to search similar products in the app stores, and extend the requirements, inspired by the identified apps. The requirements documented at each step are evaluated, to assess to which extent and in which way the initial idea evolved throughout the process. Our results show that only between 30\% and 38\% of the requirements produced after the interviews include content that can be fully traced to initial customer's ideas. Furthermore, up to 42\% of the requirements inspired by the app stores cover additional features compared to the ones identified after the interviews. The results empirically show that requirements are not elicited in strict sense, but actually co-created through interviews, with analysts playing a crucial role in the process. In addition, we show evidence that app store-inspired elicitation can be particularly beneficial to complete the requirements. △ Less

Submitted 1 August, 2022; originally announced August 2022.

ACM Class: D.2; D.2.1; H.1.2

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.01211 [pdf]

Electrically pumped quantum-dot lasers grown on 300 mm patterned Si photonic wafers

Authors: Chen Shang, Kaiyin Feng, Eamonn T. Hughes, Andrew Clark, Mukul Debnath, Rosalyn Koscica, Gerald Leake, Joshua Herman, David Harame, Peter Ludewig, Yating Wan, John E. Bowers

Abstract: Monolithic integration of quantum dot (QD) gain materials onto Si photonic platforms via direct epitaxial growth is a promising solution for on-chip light sources. Recent developments have demonstrated superior device reliability in blanket hetero-epitaxy of III-V devices on Si at elevated temperatures. Yet, thick, defect management epi designs prevent vertical light coupling from the gain region… ▽ More Monolithic integration of quantum dot (QD) gain materials onto Si photonic platforms via direct epitaxial growth is a promising solution for on-chip light sources. Recent developments have demonstrated superior device reliability in blanket hetero-epitaxy of III-V devices on Si at elevated temperatures. Yet, thick, defect management epi designs prevent vertical light coupling from the gain region to the Si-on-Insulator (SOI) waveguides. Here, we demonstrate the first electrically pumped QD lasers grown on a 300 mm patterned (001) Si wafer with a butt-coupled configuration by molecular beam epitaxy (MBE). Unique growth and fabrication challenges imposed by the template architecture have been resolved, contributing to continuous wave lasing to 60 °C and a maximum double-side output power of 126.6 mW at 20 °C with a double-side wall plug efficiency of 8.6%. The potential for robust on-chip laser operation and efficient low-loss light coupling to Si photonic circuits makes this heteroepitaxial integration platform on Si promising for scalable and low-cost mass production. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 11 pages including references, 6 figures

arXiv:2205.04047 [pdf, other]

GreyConE: Greybox fuzzing+Concolic execution guided test generation for high level design

Authors: Mukta Debnath, Animesh Basak Chowdhury, Debasri Saha, Susmita Sur-Kolay

Abstract: Exhaustive testing of high-level designs pose an arduous challenge due to complex branching conditions, loop structures and inherent concurrency of hardware designs. Test engineers aim to generate quality test-cases satisfying various code coverage metrics to ensure minimal presence of bugs in a design. Prior works in testing SystemC designs are time inefficient which obstruct achieving the desire… ▽ More Exhaustive testing of high-level designs pose an arduous challenge due to complex branching conditions, loop structures and inherent concurrency of hardware designs. Test engineers aim to generate quality test-cases satisfying various code coverage metrics to ensure minimal presence of bugs in a design. Prior works in testing SystemC designs are time inefficient which obstruct achieving the desired coverage in shorter time-span. We interleave greybox fuzzing and concolic execution in a systematic manner and generate quality test-cases accelerating test coverage metrics. Our results outperform state-of-the-art methods in terms of number of test cases and branch-coverage for some of the benchmarks, and runtime for most of them. △ Less

Submitted 13 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: 5 pages, 5 figures, 2 tables, 2 algorithms. Accepted in International Test Conference (ITC 2022)

arXiv:2203.12659 [pdf]

A Supervised Machine Learning Approach for Sequence Based Protein-protein Interaction (PPI) Prediction

Authors: Soumyadeep Debnath, Ayatullah Faruk Mollah

Abstract: Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions compared to experimental approaches. Sequence is one of the key and primary information of proteins that plays a crucial role in PPI prediction. Several machine learning approaches have been applied to exploit the characteristics of PPI datasets. Howev… ▽ More Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions compared to experimental approaches. Sequence is one of the key and primary information of proteins that plays a crucial role in PPI prediction. Several machine learning approaches have been applied to exploit the characteristics of PPI datasets. However, these datasets greatly influence the performance of predicting models. So, care should be taken on both dataset curation as well as design of predictive models. Here, we have described our submitted solution with the results of the SeqPIP competition whose objective was to develop comprehensive PPI predictive models from sequence information with high-quality bias-free interaction datasets. A training set of 2000 positive and 2000 negative interactions with sequences was given to us. Our method was evaluated with three independent high-quality interaction test datasets and with other competitors solutions. △ Less

Submitted 27 March, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: 10 pages, 5 figures, conference - COMSYS2020, competition - SeqPIP2020

arXiv:2111.00805 [pdf, other]

FuCE: Fuzzing+Concolic Execution guided Trojan Detection in Synthesizable Hardware Designs

Authors: Mukta Debnath, Animesh Basak Chowdhury, Debasri Saha, Susmita Sur-Kolay

Abstract: High-level synthesis (HLS) is the next emerging trend for designing complex customized architectures for applications such as Machine Learning, Video Processing. It provides a higher level of abstraction and freedom to hardware engineers to perform hardware software co-design. However, it opens up a new gateway to attackers to insert hardware trojans. Such trojans are semantically more meaningful… ▽ More High-level synthesis (HLS) is the next emerging trend for designing complex customized architectures for applications such as Machine Learning, Video Processing. It provides a higher level of abstraction and freedom to hardware engineers to perform hardware software co-design. However, it opens up a new gateway to attackers to insert hardware trojans. Such trojans are semantically more meaningful and stealthy, compared to gate-level trojans and therefore are hard-to-detect using state-of-the-art gate-level trojan detection techniques. Although recent works have proposed detection mechanisms to uncover such stealthy trojans in high-level synthesis (HLS) designs, these techniques are either specially curated for existing trojan benchmarks or may run into scalability issues for large designs. In this work, we leverage the power of greybox fuzzing combined with concolic execution to explore deeper segments of design and uncover stealthy trojans. Experimental results show that our proposed framework is able to automatically detect trojans faster with fewer test cases, while attaining notable branch coverage, without any manual pre-processing analysis. △ Less

Submitted 1 November, 2021; originally announced November 2021.

Comments: 23 pages, 4 figures, 6 tables, 4 listings

Showing 1–50 of 81 results for author: Debnath