Skip to main content

Showing 1–50 of 401 results for author: Mishra, S

  1. arXiv:2407.09855  [pdf, other

    cs.CL cs.AI

    Building pre-train LLM Dataset for the INDIC Languages: a case study on Hindi

    Authors: Shantipriya Parida, Shakshi Panwar, Kusum Lata, Sanskruti Mishra, Sambit Sekhar

    Abstract: Large language models (LLMs) demonstrated transformative capabilities in many applications that require automatically generating responses based on human instruction. However, the major challenge for building LLMs, particularly in Indic languages, is the availability of high-quality data for building foundation LLMs. In this paper, we are proposing a large pre-train dataset in Hindi useful for the… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted as a book chapter in the book Title "APPLIED SPEECH AND TEXT PROCESSING FOR LOW RESOURCE LANGUAGES"

  2. arXiv:2407.08223  [pdf, other

    cs.CL cs.AI

    Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

    Authors: Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

    Abstract: Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Specul… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Preprint

  3. arXiv:2407.05986  [pdf, other

    cs.CV cs.LG

    KidSat: satellite imagery to map childhood poverty dataset and benchmark

    Authors: Makkunda Sharma, Fan Yang, Duy-Nhat Vo, Esra Suel, Swapnil Mishra, Samir Bhatt, Oliver Fiala, William Rudgard, Seth Flaxman

    Abstract: Satellite imagery has emerged as an important tool to analyse demographic, health, and development indicators. While various deep learning models have been built for these tasks, each is specific to a particular problem, with few standard benchmarks available. We propose a new dataset pairing satellite imagery and high-quality survey data on child poverty to benchmark satellite feature representat… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 15 pages, 1 figure

  4. arXiv:2407.05271  [pdf, other

    cs.CL

    Beyond Binary Gender Labels: Revealing Gender Biases in LLMs through Gender-Neutral Name Predictions

    Authors: Zhiwen You, HaeJin Lee, Shubhanshu Mishra, Sullam Jeoung, Apratim Mishra, Jinseok Kim, Jana Diesner

    Abstract: Name-based gender prediction has traditionally categorized individuals as either female or male based on their names, using a binary classification system. That binary approach can be problematic in the cases of gender-neutral names that do not align with any one gender, among other reasons. Relying solely on binary gender categories without recognizing gender-neutral names can reduce the inclusiv… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024, GeBNLP Workshop

  5. arXiv:2407.04173  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    Quantifying Prediction Consistency Under Model Multiplicity in Tabular LLMs

    Authors: Faisal Hamman, Pasan Dissanayake, Saumitra Mishra, Freddy Lecue, Sanghamitra Dutta

    Abstract: Fine-tuning large language models (LLMs) on limited tabular data for classification tasks can lead to \textit{fine-tuning multiplicity}, where equally well-performing models make conflicting predictions on the same inputs due to variations in the training process (i.e., seed, random weight initialization, retraining on additional or deleted samples). This raises critical concerns about the robustn… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2407.00900  [pdf, other

    cs.AI cs.CL

    MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

    Authors: Shubhra Mishra, Gabriel Poesia, Belinda Mo, Noah D. Goodman

    Abstract: Mathematical problem solving is an important skill for Large Language Models (LLMs), both as an important capability and a proxy for a range of reasoning abilities. Existing benchmarks probe a diverse set of skills, but they yield aggregate accuracy metrics, obscuring specific abilities or weaknesses. Furthermore, they are difficult to extend with new problems, risking data contamination over time… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Dataset and code: https://github.com/gpoesia/mathcamps/

  7. arXiv:2406.17610  [pdf, other

    quant-ph cs.IT cs.SC

    YAQQ: Yet Another Quantum Quantizer -- Design Space Exploration of Quantum Gate Sets using Novelty Search

    Authors: Aritra Sarkar, Akash Kundu, Matthew Steinberg, Sibasish Mishra, Sebastiaan Fauquenot, Tamal Acharya, Jarosław A. Miszczak, Sebastian Feld

    Abstract: In the standard circuit model of quantum computation, the number and quality of the quantum gates composing the circuit influence the runtime and fidelity of the computation. The fidelity of the decomposition of quantum algorithms, represented as unitary matrices, to bounded depth quantum circuits depends strongly on the set of gates available for the decomposition routine. To investigate this dep… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  8. arXiv:2406.17066  [pdf, other

    eess.SY cs.AI cs.LO cs.RO

    Tolerance of Reinforcement Learning Controllers against Deviations in Cyber Physical Systems

    Authors: Changjian Zhang, Parv Kapoor, Eunsuk Kang, Romulo Meira-Goes, David Garlan, Akila Ganlath, Shatadal Mishra, Nejib Ammar

    Abstract: Cyber-physical systems (CPS) with reinforcement learning (RL)-based controllers are increasingly being deployed in complex physical environments such as autonomous vehicles, the Internet-of-Things(IoT), and smart cities. An important property of a CPS is tolerance; i.e., its ability to function safely under possible disturbances and uncertainties in the actual operation. In this paper, we introduc… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.07462

  9. arXiv:2406.16273  [pdf, other

    cs.CV

    YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals

    Authors: Sandeep Mishra, Oindrila Saha, Alan C. Bovik

    Abstract: 3D generation guided by text-to-image diffusion models enables the creation of visually compelling assets. However previous methods explore generation based on image or text. The boundaries of creativity are limited by what can be expressed through words or the images that can be sourced. We present YouDream, a method to generate high-quality anatomically controllable animals. YouDream is guided u… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  10. arXiv:2406.15444  [pdf, other

    cs.CL

    Investigating the Robustness of LLMs on Math Word Problems

    Authors: Ujjwala Anantheswaran, Himanshu Gupta, Kevin Scaria, Shreyas Verma, Chitta Baral, Swaroop Mishra

    Abstract: Large Language Models (LLMs) excel at various tasks, including solving math word problems (MWPs), but struggle with real-world problems containing irrelevant information. To address this, we propose a prompting framework that generates adversarial variants of MWPs by adding irrelevant variables. We introduce a dataset, ProbleMATHIC, containing both adversarial and non-adversarial MWPs. Our experim… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

  11. arXiv:2406.07742  [pdf, other

    cs.CV

    C3DAG: Controlled 3D Animal Generation using 3D pose guidance

    Authors: Sandeep Mishra, Oindrila Saha, Alan C. Bovik

    Abstract: Recent advancements in text-to-3D generation have demonstrated the ability to generate high quality 3D assets. However while generating animals these methods underperform, often portraying inaccurate anatomy and geometry. Towards ameliorating this defect, we present C3DAG, a novel pose-Controlled text-to-3D Animal Generation framework which generates a high quality 3D animal consistent with a give… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  12. arXiv:2406.06555  [pdf, other

    cs.LG cs.AI cs.CL cs.PL

    An Evaluation Benchmark for Autoformalization in Lean4

    Authors: Aryan Gulati, Devanshu Ladsaria, Shubhra Mishra, Jasdeep Sidhu, Brando Miranda

    Abstract: Large Language Models (LLMs) hold the potential to revolutionize autoformalization. The introduction of Lean4, a mathematical programming language, presents an unprecedented opportunity to rigorously assess the autoformalization capabilities of LLMs. This paper introduces a novel evaluation benchmark designed for Lean4, applying it to test the abilities of state-of-the-art LLMs, including GPT-3.5,… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: To appear at ICLR 2024 as part of the Tiny Papers track

  13. arXiv:2406.04520  [pdf, other

    cs.CL cs.AI

    NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

    Authors: Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang, Xinyun Chen, Minmin Chen, Azade Nova, Le Hou, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou

    Abstract: We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling. We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models. This eliminates the need for… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  14. arXiv:2406.02625  [pdf, other

    cs.LG cs.AI stat.ML

    Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

    Authors: Sanjay Kariyappa, Freddy Lécué, Saumitra Mishra, Christopher Pond, Daniele Magazzeni, Manuela Veloso

    Abstract: This paper proposes Progressive Inference - a framework to compute input attributions to explain the predictions of decoder-only sequence classification models. Our work is based on the insight that the classification head of a decoder-only Transformer model can be used to make intermediate predictions by evaluating them at different points in the input sequence. Due to the causal attention mechan… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  15. arXiv:2406.01899  [pdf, other

    cs.LG

    Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models

    Authors: Wenzhuo Tang, Haitao Mao, Danial Dervovic, Ivan Brugere, Saumitra Mishra, Yuying Xie, Jiliang Tang

    Abstract: Models for natural language and images benefit from data scaling behavior: the more data fed into the model, the better they perform. This 'better with more' phenomenon enables the effectiveness of large-scale pre-training on vast amounts of data. However, current graph pre-training methods struggle to scale up data due to heterogeneity across graphs. To achieve effective data scaling, we aim to d… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  16. arXiv:2405.19101  [pdf, other

    cs.LG

    Poseidon: Efficient Foundation Models for PDEs

    Authors: Maximilian Herde, Bogdan Raonić, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Emmanuel de Bézenac, Siddhartha Mishra

    Abstract: We introduce Poseidon, a foundation model for learning the solution operators of PDEs. It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed. Poseidon is pretrained on… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  17. arXiv:2405.18875  [pdf, other

    cs.AI

    Counterfactual Metarules for Local and Global Recourse

    Authors: Tom Bewley, Salim I. Amoukou, Saumitra Mishra, Daniele Magazzeni, Manuela Veloso

    Abstract: We introduce T-CREx, a novel model-agnostic method for local and global counterfactual explanation (CE), which summarises recourse options for both individuals and groups in the form of human-readable rules. It leverages tree-based surrogate models to learn the counterfactual rules, alongside 'metarules' denoting their regions of optimality, providing both a global analysis of model behaviour and… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted at ICML 2024

  18. arXiv:2405.14558  [pdf, other

    cs.LG

    FUSE: Fast Unified Simulation and Estimation for PDEs

    Authors: Levi E. Lingsch, Dana Grund, Siddhartha Mishra, Georgios Kissas

    Abstract: The joint prediction of continuous fields and statistical estimation of the underlying discrete parameters is a common problem for many physical systems, governed by PDEs. Hitherto, it has been separately addressed by employing operator learning surrogates for field prediction while using simulation-based inference (and its variants) for statistical parameter determination. Here, we argue that sol… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  19. arXiv:2405.10385  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

    Authors: Mina Ghashami, Soumya Smruti Mishra

    Abstract: The SemEval 2024 BRAINTEASER task represents a pioneering venture in Natural Language Processing (NLP) by focusing on lateral thinking, a dimension of cognitive reasoning that is often overlooked in traditional linguistic analyses. This challenge comprises of Sentence Puzzle and Word Puzzle subtasks and aims to test language models' capacity for divergent thinking. In this paper, we present our… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Accepted at SemEval 2024 (Colocated with NAACL 2024)

    Journal ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

  20. arXiv:2405.06842  [pdf, other

    cs.CR cs.DC

    BitVMX: A CPU for Universal Computation on Bitcoin

    Authors: Sergio Demian Lerner, Ramon Amela, Shreemoy Mishra, Martin Jonas, Javier Álvarez Cid-Fuentes

    Abstract: BitVMX is a new design for a virtual CPU to optimistically execute arbitrary programs on Bitcoin based on a challenge response game introduced in BitVM. Similar to BitVM1 we create a general-purpose CPU to be verified in Bitcoin script. Our design supports common architectures, such as RISC-V or MIPS. Our main contribution to the state of the art is a design that uses hash chains of program traces… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  21. arXiv:2405.05757  [pdf, other

    cs.ET eess.SY

    Design and Implementation of Energy-Efficient Wireless Tire Sensing System with Delay Analysis for Intelligent Vehicles

    Authors: Shashank Mishra, Jia-Ming Liang

    Abstract: The growing prevalence of Internet of Things (IoT) technologies has led to a rise in the popularity of intelligent vehicles that incorporate a range of sensors to monitor various aspects, such as driving speed, fuel usage, distance proximity and tire anomalies. Nowadays, real-time tire sensing systems play important roles for intelligent vehicles in increasing mileage, reducing fuel consumption, i… ▽ More

    Submitted 27 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  22. arXiv:2405.05354  [pdf, other

    cs.CV

    Transfer-LMR: Heavy-Tail Driving Behavior Recognition in Diverse Traffic Scenarios

    Authors: Chirag Parikh, Ravi Shankar Mishra, Rohan Chandra, Ravi Kiran Sarvadevabhatla

    Abstract: Recognizing driving behaviors is important for downstream tasks such as reasoning, planning, and navigation. Existing video recognition approaches work well for common behaviors (e.g. "drive straight", "brake", "turn left/right"). However, the performance is sub-par for underrepresented/rare behaviors typically found in tail of the behavior class distribution. To address this shortcoming, we propo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  23. arXiv:2405.01183  [pdf, other

    cs.LO cs.FL math.LO

    An efficient quantifier elimination procedure for Presburger arithmetic

    Authors: Christoph Haase, Shankara Narayanan Krishna, Khushraj Madnani, Om Swostik Mishra, Georg Zetzsche

    Abstract: All known quantifier elimination procedures for Presburger arithmetic require doubly exponential time for eliminating a single block of existentially quantified variables. It has even been claimed in the literature that this upper bound is tight. We observe that this claim is incorrect and develop, as the main result of this paper, a quantifier elimination procedure eliminating a block of existent… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at ICALP 2024

  24. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  25. arXiv:2404.10157  [pdf, other

    cs.CV cs.LG

    Salient Object-Aware Background Generation using Text-Guided Diffusion Models

    Authors: Amir Erfan Eshratifar, Joao V. B. Soares, Kapil Thadani, Shaunak Mishra, Mikhail Kuznetsov, Yueh-Ning Ku, Paloma de Juan

    Abstract: Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object's boundaries on… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted for publication at CVPR 2024's Generative Models for Computer Vision workshop

  26. Advances in Differential Privacy and Differentially Private Machine Learning

    Authors: Saswat Das, Subhankar Mishra

    Abstract: There has been an explosion of research on differential privacy (DP) and its various applications in recent years, ranging from novel variants and accounting techniques in differential privacy to the thriving field of differentially private machine learning (DPML) to newer implementations in practice, like those by various companies and organisations such as census bureaus. Most recent surveys foc… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Journal ref: Information Technology Security, 2024, pp 147 to 188, Springer Tracts in Electrical and Electronics Engineering, Springer, Singapore

  27. arXiv:2403.15567  [pdf, other

    cs.LG cs.CV

    Do not trust what you trust: Miscalibration in Semi-supervised Learning

    Authors: Shambhavi Mishra, Balamurali Murugesan, Ismail Ben Ayed, Marco Pedersoli, Jose Dolz

    Abstract: State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  28. arXiv:2403.14645  [pdf

    cs.CY cs.AI

    Designing Multi-Step Action Models for Enterprise AI Adoption

    Authors: Shreyash Mishra, Shrey Shah, Rex Pereira

    Abstract: This paper introduces the Multi-Step Action Model (MSAM), a closed-source AI model designed by Empsing to address challenges hindering AI adoption in enterprises. Through a holistic examination, this paper explores MSAM's foundational principles, design architecture, and future trajectory. It evaluates MSAM's performance via rigorous testing methodologies and envisions its potential impact on adva… ▽ More

    Submitted 21 February, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures

    Report number: EMP-202401 MSC Class: 68T42 ACM Class: I.2.1; I.2.8

  29. arXiv:2403.14038  [pdf, other

    cs.SI cs.HC

    PureConnect: A Localized Social Media System to Increase Awareness and Connectedness in Environmental Justice Communities

    Authors: Omar Hammad, Md Rezwanur Rahman, Gopala Krishna Vasanth Kanugo, Nicholas Clements, Shelly Miller, Shivakant Mishra, Esther Sullivan

    Abstract: Frequent disruptions like highway constructions are common now-a-days, often impacting environmental justice communities (communities with low socio-economic status with disproportionately high and adverse human health and environmental effects) that live nearby. Based on our interactions via focus groups with the members of four environmental justice communities impacted by a major highway constr… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Submitted in COMPSAC 2024

  30. arXiv:2403.12223  [pdf, ps, other

    cs.RO cs.HC

    HRI in Indian Education: Challenges Opportunities

    Authors: Chinmaya Mishra, Anuj Nandanwar, Sashikala Mishra

    Abstract: With the recent advancements in the field of robotics and the increased focus on having general-purpose robots widely available to the general public, it has become increasingly necessary to pursue research into Human-robot interaction (HRI). While there have been a lot of works discussing frameworks for teaching HRI in educational institutions with a few institutions already offering courses to s… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Presented at the Designing an Intro to HRI Course Workshop at HRI 2024 (arXiv:2403.05588)

    Report number: HRI101/2024/9

  31. A GNN Approach for Cell-Free Massive MIMO

    Authors: Lou Salaun, Hong Yang, Shashwat Mishra, Chung Shue Chen

    Abstract: Beyond 5G wireless technology Cell-Free Massive MIMO (CFmMIMO) downlink relies on carefully designed precoders and power control to attain uniformly high rate coverage. Many such power control problems can be calculated via second order cone programming (SOCP). In practice, several order of magnitude faster numerical procedure is required because power control has to be rapidly updated to adapt to… ▽ More

    Submitted 8 February, 2024; originally announced March 2024.

    Journal ref: GLOBECOM 2022 - 2022 IEEE Global Communications Conference, Dec 2022, Rio de Janeiro, France. pp.3053-3058

  32. arXiv:2403.07384  [pdf, other

    cs.CL cs.AI cs.LG

    SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

    Authors: Yu Yang, Siddhartha Mishra, Jeffrey N Chiang, Baharan Mirzasoleiman

    Abstract: Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), whic… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  33. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  34. arXiv:2403.05119  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Estimation of Electronic Band Gap Energy From Material Properties Using Machine Learning

    Authors: Sagar Prakash Barad, Sajag Kumar, Subhankar Mishra

    Abstract: Machine learning techniques are utilized to estimate the electronic band gap energy and forecast the band gap category of materials based on experimentally quantifiable properties. The determination of band gap energy is critical for discerning various material properties, such as its metallic nature, and potential applications in electronic and optoelectronic devices. While numerical methods exis… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 6 pages, IC-CGU 2024

  35. arXiv:2403.00826  [pdf, other

    cs.CL cs.CR cs.LG

    LLMGuard: Guarding Against Unsafe LLM Behavior

    Authors: Shubh Goyal, Medha Hira, Shubham Mishra, Sukriti Goyal, Arnav Goel, Niharika Dadu, Kirushikesh DB, Sameep Mehta, Nishtha Madaan

    Abstract: Although the rise of Large Language Models (LLMs) in enterprise settings brings new opportunities and capabilities, it also brings challenges, such as the risk of generating inappropriate, biased, or misleading content that violates regulations and can have legal concerns. To alleviate this, we present "LLMGuard", a tool that monitors user interactions with an LLM application and flags content aga… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

    Comments: accepted in demonstration track of AAAI-24

  36. arXiv:2402.17890  [pdf, other

    cs.LG math.OC

    From Inverse Optimization to Feasibility to ERM

    Authors: Saurabh Mishra, Anant Raj, Sharan Vaswani

    Abstract: Inverse optimization involves inferring unknown parameters of an optimization problem from known solutions and is widely used in fields such as transportation, power systems, and healthcare. We study the contextual inverse optimization setting that utilizes additional contextual information to better predict the unknown problem parameters. We focus on contextual inverse linear programming (CILP),… ▽ More

    Submitted 4 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  37. arXiv:2402.13623  [pdf, other

    cs.CL cs.SI

    FLAME: Self-Supervised Low-Resource Taxonomy Expansion using Large Language Models

    Authors: Sahil Mishra, Ujjwal Sudev, Tanmoy Chakraborty

    Abstract: Taxonomies represent an arborescence hierarchical structure that establishes relationships among entities to convey knowledge within a specific domain. Each edge in the taxonomy signifies a hypernym-hyponym relationship. Taxonomies find utility in various real-world applications, such as e-commerce search engines and recommendation systems. Consequently, there arises a necessity to enhance these t… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  38. arXiv:2402.11180  [pdf, other

    cs.SI cs.HC

    PureNav: A Personalized Navigation Service for Environmental Justice Communities Impacted by Planned Disruptions

    Authors: Omar Hammad, Md Rezwanur Rahman, Nicholas Clements, Shivakant Mishra, Shelly Miller, Esther Sullivan

    Abstract: Planned disruptions such as highway constructions are commonplace nowadays and the communities living near these disruptions generally tend to be environmental justice communities -- low socioeconomic status with disproportionately high and adverse human health and environmental effects. A major concern is that such activities negatively impact people's well-being by disrupting their daily commute… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in the proceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

  39. arXiv:2402.10926  [pdf, other

    math.NA cs.LG

    Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning

    Authors: Tim De Ryck, Siddhartha Mishra

    Abstract: Physics-informed neural networks (PINNs) and their variants have been very popular in recent years as algorithms for the numerical simulation of both forward and inverse problems for partial differential equations. This article aims to provide a comprehensive review of currently available results on the numerical analysis of PINNs and related models that constitute the backbone of physics-informed… ▽ More

    Submitted 30 January, 2024; originally announced February 2024.

    MSC Class: 65M15

  40. arXiv:2402.10460  [pdf, other

    cs.DB

    A survey of LSM-Tree based Indexes, Data Systems and KV-stores

    Authors: Supriya Mishra

    Abstract: Modern databases typically makes use of the Log Structured Merge-Tree for organizing data in indexes, which is a kind of disk-based data structure. It was proposed to efficiently handle frequent update queries (also called update intensive workloads) databases. In recent years, LSM-Tree has gained popularity and has been adopted by a number of NoSql databases, and key-value stores. Since LSM-Tree… ▽ More

    Submitted 27 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  41. arXiv:2402.05403  [pdf, other

    cs.CL cs.AI

    In-Context Principle Learning from Mistakes

    Authors: Tianjun Zhang, Aman Madaan, Luyu Gao, Steven Zheng, Swaroop Mishra, Yiming Yang, Niket Tandon, Uri Alon

    Abstract: In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): Firs… ▽ More

    Submitted 9 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  42. arXiv:2402.04596  [pdf, other

    cs.LG cs.AI cs.CV

    Towards Improved Imbalance Robustness in Continual Multi-Label Learning with Dual Output Spiking Architecture (DOSA)

    Authors: Sourav Mishra, Shirin Dora, Suresh Sundaram

    Abstract: Algorithms designed for addressing typical supervised classification problems can only learn from a fixed set of samples and labels, making them unsuitable for the real world, where data arrives as a stream of samples often associated with multiple labels over time. This motivates the study of task-agnostic continual multi-label learning problems. While algorithms using deep learning approaches fo… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 figures, 4 tables, 45 references. Submitted to IJCNN 2024

  43. arXiv:2402.03620  [pdf, other

    cs.AI cs.CL

    Self-Discover: Large Language Models Self-Compose Reasoning Structures

    Authors: Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng

    Abstract: We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasonin… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 17 pages, 11 figures, 5 tables

  44. arXiv:2402.02649  [pdf, other

    cs.CV

    Densely Decoded Networks with Adaptive Deep Supervision for Medical Image Segmentation

    Authors: Suraj Mishra, Danny Z. Chen

    Abstract: Medical image segmentation using deep neural networks has been highly successful. However, the effectiveness of these networks is often limited by inadequate dense prediction and inability to extract robust features. To achieve refined dense prediction, we propose densely decoded networks (ddn), by selectively introducing 'crutch' network connections. Such 'crutch' connections in each upsampling s… ▽ More

    Submitted 4 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  45. arXiv:2402.00355  [pdf, other

    cs.LG cs.AI math.OC

    Adaptive Primal-Dual Method for Safe Reinforcement Learning

    Authors: Weiqin Chen, James Onyejizu, Long Vu, Lan Hoang, Dharmashankar Subramanian, Koushik Kar, Sandipan Mishra, Santiago Paternain

    Abstract: Primal-dual methods have a natural application in Safe Reinforcement Learning (SRL), posed as a constrained policy optimization problem. In practice however, applying primal-dual methods to SRL is challenging, due to the inter-dependency of the learning rate (LR) and Lagrangian multipliers (dual variables) each time an embedded unconstrained RL problem is solved. In this paper, we propose, analyze… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  46. arXiv:2401.13722  [pdf, other

    cs.HC cs.AI

    Proactive Emotion Tracker: AI-Driven Continuous Mood and Emotion Monitoring

    Authors: Mohammad Asif, Sudhakar Mishra, Ankush Sonker, Sanidhya Gupta, Somesh Kumar Maurya, Uma Shanker Tiwary

    Abstract: This research project aims to tackle the growing mental health challenges in today's digital age. It employs a modified pre-trained BERT model to detect depressive text within social media and users' web browsing data, achieving an impressive 93% test accuracy. Simultaneously, the project aims to incorporate physiological signals from wearable devices, such as smartwatches and EEG sensors, to prov… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  47. arXiv:2401.07892  [pdf, other

    cs.HC

    Deep Fuzzy Framework for Emotion Recognition using EEG Signals and Emotion Representation in Type-2 Fuzzy VAD Space

    Authors: Mohammad Asif, Noman Ali, Sudhakar Mishra, Anushka Dandawate, Uma Shanker Tiwary

    Abstract: Recently, the representation of emotions in the Valence, Arousal and Dominance (VAD) space has drawn enough attention. However, the complex nature of emotions and the subjective biases in self-reported values of VAD make the emotion model too specific to a particular experiment. This study aims to develop a generic model representing emotions using a fuzzy VAD space and improve emotion recognition… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  48. arXiv:2401.02794  [pdf, other

    eess.IV cs.CV

    Subjective and Objective Analysis of Indian Social Media Video Quality

    Authors: Sandeep Mishra, Mukul Jha, Alan C. Bovik

    Abstract: We conducted a large-scale subjective study of the perceptual quality of User-Generated Mobile Video Content on a set of mobile-originated videos obtained from the Indian social media platform ShareChat. The content viewed by volunteer human subjects under controlled laboratory conditions has the benefit of culturally diversifying the existing corpus of User-Generated Content (UGC) video quality d… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Submitted to the IEEE Transactions on Image Processing

  49. arXiv:2401.00420  [pdf, other

    cs.CV cs.AI

    SynCDR : Training Cross Domain Retrieval Models with Synthetic Data

    Authors: Samarth Mishra, Carlos D. Castillo, Hongcheng Wang, Kate Saenko, Venkatesh Saligrama

    Abstract: In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains. For instance, given a sketch of an object, a model needs to retrieve a real image of it from an online store's catalog. A standard approach for such a problem is learning a feature space of images where Euclidean distances reflect similarity. Even without human annotations,… ▽ More

    Submitted 19 March, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: Pre-print

  50. arXiv:2312.15063  [pdf, other

    cs.LG cond-mat.dis-nn

    A universal approximation theorem for nonlinear resistive networks

    Authors: Benjamin Scellier, Siddhartha Mishra

    Abstract: Resistor networks have recently had a surge of interest as substrates for energy-efficient self-learning machines. This work studies the computational capabilities of these resistor networks. We show that electrical networks composed of voltage sources, linear resistors, diodes and voltage-controlled voltage sources (VCVS) can implement any continuous functions. To prove it, we assume that the cir… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.