Skip to main content

Showing 1–50 of 80 results for author: Rastogi, A

  1. arXiv:2406.06592  [pdf, other

    cs.CL cs.LG

    Improve Mathematical Reasoning in Language Models by Automated Process Supervision

    Authors: Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi

    Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a leng… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 5 figures, 1 table

  2. arXiv:2405.18368  [pdf, other

    cs.CV

    The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI

    Authors: Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D'Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole , et al. (60 additional authors not shown)

    Abstract: Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key r… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 1 table

  3. arXiv:2405.16981  [pdf, other

    cs.SE

    Characterising Developer Sentiment in Software Components: An Exploratory Study of Gentoo

    Authors: Tien Rahayu Tulili, Ayushi Rastogi, Andrea Capiluppi

    Abstract: Collaborative software development happens in teams, that cooperate on shared artefacts, and discuss development on online platforms. Due to the complexity of development and the variety of teams, software components often act as effective containers for parallel work and teams. Past research has shown how communication between team members, especially in an open-source environment, can become e… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2404.01096  [pdf, other

    cs.SE cs.PL

    Enabling Memory Safety of C Programs using LLMs

    Authors: Nausheen Mohammed, Akash Lal, Aseem Rastogi, Subhajit Roy, Rahul Sharma

    Abstract: Memory safety violations in low-level code, written in languages like C, continues to remain one of the major sources of software vulnerabilities. One method of removing such violations by construction is to port C code to a safe C dialect. Such dialects rely on programmer-supplied annotations to guarantee safety with minimal runtime overhead. This porting, however, is a manual process that impose… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  5. arXiv:2403.20120  [pdf, ps, other

    cs.CR

    Privacy-Preserving Data Aggregation Techniques for Enhanced Efficiency and Security in Wireless Sensor Networks: A Comprehensive Analysis and Evaluation

    Authors: Ayush Rastogi, Harsh Rastogi, Yash Rastogi, Divyansh Dubey

    Abstract: In this paper, we present a multidimensional, highly effective method for aggregating data for wireless sensor networks while maintaining privacy. The suggested system is resistant to data loss and secure against both active and passive privacy compromising attacks, such as the coalition attack from a rogue base station and kidnapped sensor nodes. With regard to cluster size, it achieves consisten… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 4 pages

  6. arXiv:2403.10704  [pdf, other

    cs.LG cs.AI cs.CL

    PERL: Parameter Efficient Reinforcement Learning from Human Feedback

    Authors: Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Simral Chaudhary, Bowen Li, Saravanan Ganesh, Bill Byrne, Jessica Hoffmann, Hassan Mansoor, Wei Li, Abhinav Rastogi, Lucas Dixon

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has proven to be a strong method to align Pretrained Large Language Models (LLMs) with human preferences. But training models with RLHF is computationally expensive, and an overall complex process. In this work, we study RLHF where the underlying models are trained using the parameter efficient method of Low-Rank Adaptation (LoRA) introduced by Hu… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  7. arXiv:2402.19038  [pdf, other

    cs.SE

    Understanding Fairness in Software Engineering: Insights from Stack Exchange

    Authors: Emeralda Sesari, Federica Sarro, Ayushi Rastogi

    Abstract: Software practitioners discuss problems at work with peers, in-person and online. These discussions can be technical (e.g., how to fix a bug?) and social (e.g., how to assign work fairly?). While there is a growing body of knowledge exploring fairness problems and solutions in the human and social factors of software engineering, most focus has been on specific problems. This study provides fairne… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: To be published in 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2024

  8. The Devil Is in the Command Line: Associating the Compiler Flags With the Binary and Build Metadata

    Authors: Gunnar Kudrjavets, Aditya Kumar, Jeff Thomas, Ayushi Rastogi

    Abstract: Engineers build large software systems for multiple architectures, operating systems, and configurations. A set of inconsistent or missing compiler flags generates code that catastrophically impacts the system's behavior. In the authors' industry experience, defects caused by an undesired combination of compiler flags are common in nontrivial software projects. We are unaware of any build and CI/C… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 3 pages. To be published in the 46th International Conference on Software Engineering (ICSE 2024), April 14 - April 20 2024, Lisbon, Portugal

  9. What Do You Mean by Memory? When Engineers Are Lost in the Maze of Complexity

    Authors: Gunnar Kudrjavets, Aditya Kumar, Jeff Thomas, Ayushi Rastogi

    Abstract: An accepted practice to decrease applications' memory usage is to reduce the amount and frequency of memory allocations. Factors such as (a) the prevalence of out-of-memory (OOM) killers, (b) memory allocations in modern programming languages done implicitly, (c) overcommitting being a default strategy in the Linux kernel, and (d) the rise in complexity and terminology related to memory management… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 3 pages. To be published in the 46th International Conference on Software Engineering (ICSE 2024), April 14 - April 20 2024, Lisbon, Portugal

  10. arXiv:2311.07948  [pdf, other

    cs.PL cs.LG

    Finding Inductive Loop Invariants using Large Language Models

    Authors: Adharsh Kamath, Aditya Senthilnathan, Saikat Chakraborty, Pantazis Deligiannis, Shuvendu K. Lahiri, Akash Lal, Aseem Rastogi, Subhajit Roy, Rahul Sharma

    Abstract: Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting t… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  11. Does Code Review Speed Matter for Practitioners?

    Authors: Gunnar Kudrjavets, Ayushi Rastogi

    Abstract: Increasing code velocity is a common goal for a variety of software projects. The efficiency of the code review process significantly impacts how fast the code gets merged into the final product and reaches the customers. We conducted a survey to study the code velocity-related beliefs and practices in place. We analyzed 75 completed surveys from 39 participants from the industry and 36 from the o… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: 29 pages, 7 figures. To be published in Empirical Software Engineering An International Journal

  12. arXiv:2310.09342  [pdf, other

    cs.PL cs.AI cs.CL cs.SE

    Ranking LLM-Generated Loop Invariants for Program Verification

    Authors: Saikat Chakraborty, Shuvendu K. Lahiri, Sarah Fakhoury, Madanlal Musuvathi, Akash Lal, Aseem Rastogi, Aditya Senthilnathan, Rahul Sharma, Nikhil Swamy

    Abstract: Synthesizing inductive loop invariants is fundamental to automating program verification. In this work, we observe that Large Language Models (such as gpt-3.5 or gpt-4) are capable of synthesizing loop invariants for a class of programs in a 0-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier to establish an… ▽ More

    Submitted 12 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Findings of The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP-findings 2023)

  13. arXiv:2309.00267  [pdf, other

    cs.CL cs.AI cs.LG

    RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

    Authors: Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash

    Abstract: Reinforcement learning from human feedback (RLHF) has proven effective in aligning large language models (LLMs) with human preferences. However, gathering high-quality human preference labels can be a time-consuming and expensive endeavor. RL from AI Feedback (RLAIF), introduced by Bai et al., offers a promising alternative that leverages a powerful off-the-shelf LLM to generate preferences in lie… ▽ More

    Submitted 30 November, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: Added two more tasks and many more experiments and analyses (e.g. same-size RLAIF, direct RLAIF, cost analysis)

  14. arXiv:2308.05177  [pdf, other

    cs.SE cs.PL

    Fixing Rust Compilation Errors using LLMs

    Authors: Pantazis Deligiannis, Akash Lal, Nikita Mehrotra, Aseem Rastogi

    Abstract: The Rust programming language, with its safety guarantees, has established itself as a viable choice for low-level systems programming language over the traditional, unsafe alternatives like C/C++. These guarantees come from a strong ownership-based type system, as well as primitive support for features like closures, pattern matching, etc., that make the code more concise and amenable to reasonin… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  15. arXiv:2305.13725  [pdf, other

    cs.CL cs.IR

    Conversational Recommendation as Retrieval: A Simple, Strong Baseline

    Authors: Raghav Gupta, Renat Aksitov, Samrat Phatale, Simral Chaudhary, Harrison Lee, Abhinav Rastogi

    Abstract: Conversational recommendation systems (CRS) aim to recommend suitable items to users through natural language conversation. However, most CRS approaches do not effectively utilize the signal provided by these conversations. They rely heavily on explicit external knowledge e.g., knowledge graphs to augment the models' understanding of the items and attributes, which is quite hard to scale. To allev… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: To appear at the 5th NLP4ConvAI workshop

  16. Are We Speeding Up or Slowing Down? On Temporal Aspects of Code Velocity

    Authors: Gunnar Kudrjavets, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: This paper investigates how the duration of various code review periods changes over a projects' lifetime. We study four open-source software (OSS) projects: Blender, FreeBSD, LLVM, and Mozilla. We mine and analyze the characteristics of 283,235 code reviews that cover, on average, seven years' worth of development. Our main conclusion is that neither the passage of time or the project's size impa… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: 5 pages. To be published in Proceedings of MSR '23: Proceedings of the 20th International Conference on Mining Software Repositories (MSR 2023). May 15-16, 2023, Melbourne, Australia

  17. arXiv:2303.01954  [pdf, other

    stat.ML cs.AI cs.LG

    Synthetic Data Generator for Adaptive Interventions in Global Health

    Authors: Aditya Rastogi, Juan Francisco Garamendi, Ana Fernández del Río, Anna Guitart, Moiz Hassan Khan, Dexian Tang, África Periáñez

    Abstract: Artificial Intelligence and digital health have the potential to transform global health. However, having access to representative data to test and validate algorithms in realistic production environments is essential. We introduce HealthSyn, an open-source synthetic data generator of user behavior for testing reinforcement learning algorithms in the context of mobile health interventions. The gen… ▽ More

    Submitted 27 April, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

  18. Who Ate My Memory? Towards Attribution in Memory Management

    Authors: Gunnar Kudrjavets, Ayushi Rastogi, Jeff Thomas, Nachiappan Nagappan

    Abstract: To understand applications' memory usage details, engineers use instrumented builds and profiling tools. Both approaches are impractical for use in production environments or deployed mobile applications. As a result, developers can gather only high-level memory-related statistics for deployed software. In our experience, the lack of granular field data makes fixing performance and reliability-rel… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: 3 pages. To be published in the 45th International Conference on Software Engineering (ICSE 2023), May 14 - May 20 2023, Melbourne, Australia

  19. arXiv:2212.09939  [pdf, other

    cs.CL

    AnyTOD: A Programmable Task-Oriented Dialog System

    Authors: Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, Yonghui Wu

    Abstract: We propose AnyTOD, an end-to-end, zero-shot task-oriented dialog (TOD) system capable of handling unseen tasks without task-specific training. We view TOD as a program executed by a language model (LM), where program logic and ontology is provided by a designer as a schema. To enable generalization to unseen schemas and programs without prior training, AnyTOD adopts a neuro-symbolic approach. A ne… ▽ More

    Submitted 13 February, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: v2, update with Multiwoz, SGD results

  20. arXiv:2212.08704  [pdf, other

    cs.AI

    Speech Aware Dialog System Technology Challenge (DSTC11)

    Authors: Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda

    Abstract: Most research on task oriented dialog modeling is based on written text input. However, users interact with practical dialog systems often using speech as input. Typically, systems convert speech into text using an Automatic Speech Recognition (ASR) system, introducing errors. Furthermore, these systems do not address the differences in written and spoken language. The research on this topic is st… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

  21. arXiv:2208.13289  [pdf, other

    math.ST cs.LG stat.ML

    Statistical Inverse Problems in Hilbert Scales

    Authors: Abhishake Rastogi

    Abstract: In this paper, we study the Tikhonov regularization scheme in Hilbert scales for the nonlinear statistical inverse problem with a general noise. The regularizing norm in this scheme is stronger than the norm in Hilbert space. We focus on developing a theoretical analysis for this scheme based on the conditional stability estimates. We utilize the concept of the distance function to establish the h… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

    Journal ref: Journal of Complexity 82 (2024) 101824

  22. arXiv:2208.09628  [pdf, other

    cs.LG cs.AI cs.CY

    Are You Comfortable Now: Deep Learning the Temporal Variation in Thermal Comfort in Winters

    Authors: Betty Lala, Srikant Manas Kala, Anmol Rastogi, Kunal Dahiya, Aya Hagishima

    Abstract: Indoor thermal comfort in smart buildings has a significant impact on the health and performance of occupants. Consequently, machine learning (ML) is increasingly used to solve challenges related to indoor thermal comfort. Temporal variability of thermal comfort perception is an important problem that regulates occupant well-being and energy consumption. However, in most ML-based thermal comfort s… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

    Comments: Accepted for publication in IEEE SMC 2022

  23. When malloc() Never Returns NULL -- Reliability as an Illusion

    Authors: Gunnar Kudrjavets, Jeff Thomas, Aditya Kumar, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: For decades, the guidance given to software engineers has been to check the memory allocation results. This validation step is necessary to avoid crashes. However, in user mode, in modern operating systems (OS), such as Android, FreeBSD, iOS, and macOS, the caller does not have an opportunity to handle the memory allocation failures. This behavioral trait results from the actions of a system compo… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: 6 pages. To be published in the 33rd IEEE International Symposium on Software Reliability Engineering (ISSRE 2022), Oct 31 - Nov 3 2022, Charlotte, North Carolina, USA

  24. arXiv:2206.14202  [pdf, other

    cs.LG

    Building Matters: Spatial Variability in Machine Learning Based Thermal Comfort Prediction in Winters

    Authors: Betty Lala, Srikant Manas Kala, Anmol Rastogi, Kunal Dahiya, Hirozumi Yamaguchi, Aya Hagishima

    Abstract: Thermal comfort in indoor environments has an enormous impact on the health, well-being, and performance of occupants. Given the focus on energy efficiency and Internet-of-Things enabled smart buildings, machine learning (ML) is being increasingly used for data-driven thermal comfort (TC) prediction. Generally, ML-based solutions are proposed for air-conditioned or HVAC ventilated buildings and th… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: Accepted in SmartSys SMARTCOMP 2022

  25. There Ain't No Such Thing as a Free Custom Memory Allocator

    Authors: Gunnar Kudrjavets, Jeff Thomas, Aditya Kumar, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: Using custom memory allocators is an efficient performance optimization technique. However, dependency on a custom allocator can introduce several maintenance-related issues. We present lessons learned from the industry and provide critical guidance for using custom memory allocators and enumerate various challenges associated with integrating them. These recommendations are based on years of expe… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 4 pages. To be published in 38th IEEE International Conference on Software Maintenance and Evolution (ICSME 2022), Oct 3-7, 2022, Limassol, Cyprus

  26. Is Kernel Code Different From Non-Kernel Code? A Case Study of BSD Family Operating Systems

    Authors: Gunnar Kudrjavets, Jeff Thomas, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: Code churn and code velocity describe the evolution of a code base. Current research quantifies and studies code churn and velocity at a high level of abstraction, often at the overall project level or even at the level of an entire company. We argue that such an approach ignores noticeable differences among the subsystems of large projects. We conducted an exploratory study on four BSD family ope… ▽ More

    Submitted 11 June, 2022; originally announced June 2022.

    Comments: 13 pages. To be published in 38th IEEE International Conference on Software Maintenance and Evolution (ICSME 2022), Oct 3-7, 2022, Limassol, Cyprus

  27. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  28. Show, Don't Tell: Demonstrations Outperform Descriptions for Schema-Guided Task-Oriented Dialogue

    Authors: Raghav Gupta, Harrison Lee, Jeffrey Zhao, Abhinav Rastogi, Yuan Cao, Yonghui Wu

    Abstract: Building universal dialogue systems that operate across multiple domains/APIs and generalize to new ones with minimal overhead is a critical challenge. Recent works have leveraged natural language descriptions of schema elements to enable such systems; however, descriptions only indirectly convey schema semantics. In this work, we propose Show, Don't Tell, which prompts seq2seq models with a label… ▽ More

    Submitted 17 October, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: NAACL 2022

    Journal ref: In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4541-4549, Seattle, United States. Association for Computational Linguistics

  29. The Unexplored Treasure Trove of Phabricator Code Review

    Authors: Gunnar Kudrjavets, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: Phabricator is a modern code collaboration tool used by popular projects like FreeBSD and Mozilla. However, unlike the other well-known code review environments, such as Gerrit or GitHub, there is no readily accessible public code review dataset for Phabricator. This paper describes our experience mining code reviews from five different projects that use Phabricator (Blender, FreeBSD, KDE, LLVM, a… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: 5 pages. To be published in Proceedings of MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories (MSR 2022). ACM, New York, NY, USA

  30. Mining Code Review Data to Understand Waiting Times Between Acceptance and Merging: An Empirical Analysis

    Authors: Gunnar Kudrjavets, Aditya Kumar, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: Increasing code velocity (or the speed with which code changes are reviewed and merged) is integral to speeding up development and contributes to the work satisfaction of engineers. While factors affecting code change acceptance have been investigated in the past, solutions to decrease the code review lifetime are less understood. This study investigates the code review process to quantify delays… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

    Comments: 12 pages. To be published in Proceedings of MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories (MSR 2022). ACM, New York, NY, USA

  31. Do Small Code Changes Merge Faster? A Multi-Language Empirical Investigation

    Authors: Gunnar Kudrjavets, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: Code velocity, or the speed with which code changes are integrated into a production environment, plays a crucial role in Continuous Integration and Continuous Deployment. Many studies report factors influencing code velocity. However, solutions to increase code velocity are unclear. Meanwhile, the industry continues to issue guidelines on "ideal" code change size, believing it increases code velo… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

    Comments: 12 pages. To be published in Proceedings of MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories (MSR 2022). ACM, New York, NY, USA

  32. Quantifying Daily Evolution of Mobile Software Based on Memory Allocator Churn

    Authors: Gunnar Kudrjavets, Jeff Thomas, Aditya Kumar, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: The pace and volume of code churn necessary to evolve modern software systems present challenges for analyzing the performance impact of any set of code changes. Traditional methods used in performance analysis rely on extensive data collection and profiling, which often takes days. For large organizations utilizing Continuous Integration (CI) and Continuous Deployment (CD), these traditional tech… ▽ More

    Submitted 6 May, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: 5 pages. To be published in Proceedings of The 9th International Conference on Mobile Software Engineering and Systems (MobileSoft '22). ACM, New York, NY, USA

  33. arXiv:2201.12409  [pdf, other

    cs.CL cs.AI

    A Unified Approach to Entity-Centric Context Tracking in Social Conversations

    Authors: Ulrich Rückert, Srinivas Sunkara, Abhinav Rastogi, Sushant Prakash, Pranav Khaitan

    Abstract: In human-human conversations, Context Tracking deals with identifying important entities and keeping track of their properties and relationships. This is a challenging problem that encompasses several subtasks such as slot tagging, coreference resolution, resolving plural mentions and entity linking. We approach this problem as an end-to-end modeling task where the conversational context is repres… ▽ More

    Submitted 26 April, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: Published at LREC 2022

  34. The Unexplored Terrain of Compiler Warnings

    Authors: Gunnar Kudrjavets, Aditya Kumar, Nachiappan Nagappan, Ayushi Rastogi

    Abstract: The authors' industry experiences suggest that compiler warnings, a lightweight version of program analysis, are valuable early bug detection tools. Significant costs are associated with patches and security bulletins for issues that could have been avoided if compiler warnings were addressed. Yet, the industry's attitude towards compiler warnings is mixed. Practices range from silencing all compi… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: 2 pages. To be published in 44nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP '22), May 21-29, 2022, Pittsburgh, PA, USA

  35. arXiv:2201.08904  [pdf, other

    cs.CL cs.AI

    Description-Driven Task-Oriented Dialog Modeling

    Authors: Jeffrey Zhao, Raghav Gupta, Yuan Cao, Dian Yu, Mingqiu Wang, Harrison Lee, Abhinav Rastogi, Izhak Shafran, Yonghui Wu

    Abstract: Task-oriented dialogue (TOD) systems are required to identify key information from conversations for the completion of given tasks. Such information is conventionally specified in terms of intents and slots contained in task-specific ontology or schemata. Since these schemata are designed by system developers, the naming convention for slots and intents is not uniform across tasks, and may not con… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

  36. SteelCore: An Extensible Concurrent Separation Logic for Effectful Dependently Typed Programs

    Authors: Nikhil Swamy, Aseem Rastogi, Aymeric Fromherz, Denis Merigoux, Danel Ahman, Guido Martínez

    Abstract: Much recent research has been devoted to modeling effects within type theory. Building on this work, we observe that effectful type theories can provide a foundation on which to build semantics for more complex programming constructs and program logics, extending the reasoning principles that apply within the host effectful type theory itself. Concretely, our main contribution is a semantics for c… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

    Comments: ICFP 2020 camera-ready version

  37. SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems

    Authors: Harrison Lee, Raghav Gupta, Abhinav Rastogi, Yuan Cao, Bin Zhang, Yonghui Wu

    Abstract: Zero/few-shot transfer to unseen services is a critical challenge in task-oriented dialogue research. The Schema-Guided Dialogue (SGD) dataset introduced a paradigm for enabling models to support any service in zero-shot through schemas, which describe service APIs to models in natural language. We explore the robustness of dialogue systems to linguistic variations in schemas by designing SGD-X -… ▽ More

    Submitted 23 August, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: AAAI 2022

    Journal ref: Lee, H., Gupta, R., Rastogi, A., Cao, Y., Zhang, B., & Wu, Y. (2022). SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 10938-10946

  38. arXiv:2108.09946  [pdf, other

    cs.SE

    Pull Request Latency Explained: An Empirical Overview

    Authors: Xunhui Zhang, Yue Yu, Tao Wang, Ayushi Rastogi, Huaimin Wang

    Abstract: Pull request latency evaluation is an essential application of effort evaluation in the pull-based development scenario. It can help the reviewers sort the pull request queue, remind developers about the review processing time, speed up the review process and accelerate software development. There is a lack of work that systematically organizes the factors that affect pull request latency. Also, t… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

  39. arXiv:2107.13731  [pdf, other

    cs.CV cs.AI

    UIBert: Learning Generic Multimodal Representations for UI Understanding

    Authors: Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, Blaise Aguera y Arcas

    Abstract: To improve the accessibility of smart devices and to simplify their usage, building models which understand user interfaces (UIs) and assist users to complete their tasks is critical. However, unique challenges are proposed by UI-specific characteristics, such as how to effectively leverage multimodal UI features that involve image, text, and structural metadata and how to achieve good performance… ▽ More

    Submitted 10 August, 2021; v1 submitted 28 July, 2021; originally announced July 2021.

    Comments: 8 pages, IJCAI 2021

  40. arXiv:2107.05829  [pdf, other

    cs.SE

    Promises and Perils of Inferring Personality on GitHub

    Authors: Frenk van Mil, Ayushi Rastogi, Andy Zaidman

    Abstract: Personality plays a pivotal role in our understanding of human actions and behavior. Today, the applications of personality are widespread, built on the solutions from psychology to infer personality. In software engineering, for instance, one widely used solution to infer personality uses textual communication data. As studies on personality in software engineering continue to grow, it is imperat… ▽ More

    Submitted 15 July, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

  41. arXiv:2106.01885  [pdf, other

    cs.SE

    How does Software Change?

    Authors: Ayushi Rastogi, Georgios Gousios

    Abstract: Software evolves with changes to its codebase over time. Internally, software changes in response to decisions to include some code change into the codebase and discard others. Explaining the mechanism of software evolution, this paper presents a theory of software change. Our theory is grounded in multiple evidence sources (e.g., GitHub documentation and relevant scientific literature) relating t… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

  42. arXiv:2105.13970  [pdf, other

    cs.SE

    Pull Request Decision Explained: An Empirical Overview

    Authors: Xunhui Zhang, Yue Yu, Georgios Gousios, Ayushi Rastogi

    Abstract: Context: Pull-based development model is widely used in open source, leading the trends in distributed software development. One aspect which has garnered significant attention is studies on pull request decision - identifying factors for explanation. Objective: This study builds on a decade long research on pull request decision to explain it. We empirically investigate how factors influence pull… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  43. arXiv:2105.04236  [pdf, other

    cs.CR cs.LG cs.MS

    SIRNN: A Math Library for Secure RNN Inference

    Authors: Deevashwer Rathee, Mayank Rathee, Rahul Kranti Kiran Goli, Divya Gupta, Rahul Sharma, Nishanth Chandran, Aseem Rastogi

    Abstract: Complex machine learning (ML) inference algorithms like recurrent neural networks (RNNs) use standard functions from math libraries like exponentiation, sigmoid, tanh, and reciprocal of square root. Although prior work on secure 2-party inference provides specialized protocols for convolutional neural networks (CNNs), existing secure implementations of these math operators rely on generic 2-party… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: IEEE Security and Privacy 2021

  44. Colonoscopy Polyp Detection and Classification: Dataset Creation and Comparative Evaluations

    Authors: Kaidong Li, Mohammad I. Fathan, Krushi Patel, Tianxiao Zhang, Cuncong Zhong, Ajay Bansal, Amit Rastogi, Jean S. Wang, Guanghui Wang

    Abstract: Colorectal cancer (CRC) is one of the most common types of cancer with a high mortality rate. Colonoscopy is the preferred procedure for CRC screening and has proven to be effective in reducing CRC mortality. Thus, a reliable computer-aided polyp detection and classification system can significantly increase the effectiveness of colonoscopy. In this paper, we create an endoscopic dataset collected… ▽ More

    Submitted 5 August, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

  45. arXiv:2012.05064  [pdf, other

    cs.CR

    Secure Medical Image Analysis with CrypTFlow

    Authors: Javier Alvarez-Valle, Pratik Bhatu, Nishanth Chandran, Divya Gupta, Aditya Nori, Aseem Rastogi, Mayank Rathee, Rahul Sharma, Shubham Ugare

    Abstract: We present CRYPTFLOW, a system that converts TensorFlow inference code into Secure Multi-party Computation (MPC) protocols at the push of a button. To do this, we build two components. Our first component is an end-to-end compiler from TensorFlow to a variety of MPC protocols. The second component is an improved semi-honest 3-party protocol that provides significant speedups for inference. We empi… ▽ More

    Submitted 9 December, 2020; originally announced December 2020.

    Comments: 6 pages. PPML NeurIPS 2020 Workshop, Vancouver, Canada. arXiv admin note: substantial text overlap with arXiv:1909.07814

  46. arXiv:2011.06486  [pdf, ps, other

    cs.CL

    Overview of the Ninth Dialog System Technology Challenge: DSTC9

    Authors: Chulaka Gunasekara, Seokhwan Kim, Luis Fernando D'Haro, Abhinav Rastogi, Yun-Nung Chen, Mihail Eric, Behnam Hedayatnia, Karthik Gopalakrishnan, Yang Liu, Chao-Wei Huang, Dilek Hakkani-Tür, Jinchao Li, Qi Zhu, Lingxiao Luo, Lars Liden, Kaili Huang, Shahin Shayandeh, Runze Liang, Baolin Peng, Zheng Zhang, Swadheen Shukla, Minlie Huang, Jianfeng Gao, Shikib Mehri, Yulan Feng , et al. (14 additional authors not shown)

    Abstract: This paper introduces the Ninth Dialog System Technology Challenge (DSTC-9). This edition of the DSTC focuses on applying end-to-end dialog technologies for four distinct tasks in dialog systems, namely, 1. Task-oriented dialog Modeling with unstructured knowledge access, 2. Multi-domain task-oriented dialog, 3. Interactive evaluation of dialog, and 4. Situated interactive multi-modal dialog. This… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

  47. CrypTFlow2: Practical 2-Party Secure Inference

    Authors: Deevashwer Rathee, Mayank Rathee, Nishant Kumar, Nishanth Chandran, Divya Gupta, Aseem Rastogi, Rahul Sharma

    Abstract: We present CrypTFlow2, a cryptographic framework for secure inference over realistic Deep Neural Networks (DNNs) using secure 2-party computation. CrypTFlow2 protocols are both correct -- i.e., their outputs are bitwise equivalent to the cleartext execution -- and efficient -- they outperform the state-of-the-art protocols in both latency and scale. At the core of CrypTFlow2, we have new 2PC proto… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: To appear at ACM CCS 2020. Code available at: https://github.com/mpc-msri/EzPC

  48. Questions for Data Scientists in Software Engineering: A Replication

    Authors: Hennie Huijgens, Ayushi Rastogi, Ernst Mulders, Georgios Gousios, Arie van Deursen

    Abstract: In 2014, a Microsoft study investigated the sort of questions that data science applied to software engineering should answer. This resulted in 145 questions that developers considered relevant for data scientists to answer, thus providing a research agenda to the community. Fast forward to five years, no further studies investigated whether the questions from the software engineers at Microsoft h… ▽ More

    Submitted 4 January, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

  49. Including Everyone, Everywhere: Understanding Opportunities and Challenges of Geographic Gender-Inclusion in OSS

    Authors: Gede Artha Azriadi Prana, Denae Ford, Ayushi Rastogi, David Lo, Rahul Purandare, Nachiappan Nagappan

    Abstract: The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our st… ▽ More

    Submitted 15 September, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: 19 pages, 16 tables, 3 figures, Includes appendices

    Journal ref: IEEE Transactions on Software Engineering 2021

  50. arXiv:2007.12720  [pdf, other

    cs.CL cs.AI

    MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines

    Authors: Xiaoxue Zang, Abhinav Rastogi, Srinivas Sunkara, Raghav Gupta, Jianguo Zhang, Jindong Chen

    Abstract: MultiWOZ is a well-known task-oriented dialogue dataset containing over 10,000 annotated dialogues spanning 8 domains. It is extensively used as a benchmark for dialogue state tracking. However, recent works have reported presence of substantial noise in the dialogue state annotations. MultiWOZ 2.1 identified and fixed many of these erroneous annotations and user utterances, resulting in an improv… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Journal ref: Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI (2020) 109-117