subscribe to arXiv mailings

Active Human Pose Estimation via an Autonomous UAV Agent

Authors: Jingxi Chen, Botao He, Chahat Deep Singh, Cornelia Fermuller, Yiannis Aloimonos

Abstract: One of the core activities of an active observer involves moving to secure a "better" view of the scene, where the definition of "better" is task-dependent. This paper focuses on the task of human pose estimation from videos capturing a person's activity. Self-occlusions within the scene can complicate or even prevent accurate human pose estimation. To address this, relocating the camera to a new… ▽ More One of the core activities of an active observer involves moving to secure a "better" view of the scene, where the definition of "better" is task-dependent. This paper focuses on the task of human pose estimation from videos capturing a person's activity. Self-occlusions within the scene can complicate or even prevent accurate human pose estimation. To address this, relocating the camera to a new vantage point is necessary to clarify the view, thereby improving 2D human pose estimation. This paper formalizes the process of achieving an improved viewpoint. Our proposed solution to this challenge comprises three main components: a NeRF-based Drone-View Data Generation Framework, an On-Drone Network for Camera View Error Estimation, and a Combined Planner for devising a feasible motion plan to reposition the camera based on the predicted errors for camera views. The Data Generation Framework utilizes NeRF-based methods to generate a comprehensive dataset of human poses and activities, enhancing the drone's adaptability in various scenarios. The Camera View Error Estimation Network is designed to evaluate the current human pose and identify the most promising next viewing angles for the drone, ensuring a reliable and precise pose estimation from those angles. Finally, the combined planner incorporates these angles while considering the drone's physical and environmental limitations, employing efficient algorithms to navigate safe and effective flight paths. This system represents a significant advancement in active 2D human pose estimation for an autonomous UAV agent, offering substantial potential for applications in aerial cinematography by improving the performance of autonomous human pose estimation and maintaining the operational safety and efficiency of UAVs. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.15494 [pdf]

Simple Cracking of (Noise-Based) Dynamic Watermarking in Smart Grids

Authors: Mehmet Yildirim, Nasir Kenarangui, Robert Balog, Laszlo B. Kish, Chanan Singh

Abstract: Previous research employing a conceptual approach with a digital twin has demonstrated that (noise-based) dynamic watermarking is incapable of providing unconditional security in smart electrical grid systems. However, the implementation of digital twins can be prohibitively costly or infeasible due to limited available data on critical infrastructure. In this study, we first analyze the spectral… ▽ More Previous research employing a conceptual approach with a digital twin has demonstrated that (noise-based) dynamic watermarking is incapable of providing unconditional security in smart electrical grid systems. However, the implementation of digital twins can be prohibitively costly or infeasible due to limited available data on critical infrastructure. In this study, we first analyze the spectral properties of dynamic watermarking and its associated protocol. Subsequently, we present a straightforward attack inspired by the digital twin method, which extracts and utilizes the grid noises and completely breaches the security of dynamic watermarking without requiring knowledge of the private watermarking signal. The attacker can fully expose the grid while evading detection by the controller. Our findings indicate that in the absence of secure and authenticated communications, dynamic watermarking offers neither conditional nor unconditional security. Conversely, when communication lines, sensors, and communicators are equipped with tamper-resistant and secure/authenticated links, dynamic watermarking becomes redundant for grid security. △ Less

Submitted 27 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted for publication in Fluctuation and Noise Letters

arXiv:2406.09409 [pdf, other]

CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

Authors: Sachin Shah, Matthew Albert Chan, Haoming Cai, Jingxi Chen, Sakshum Kulshrestha, Chahat Deep Singh, Yiannis Aloimonos, Christopher Metzler

Abstract: Point-spread-function (PSF) engineering is a well-established computational imaging technique that uses phase masks and other optical elements to embed extra information (e.g., depth) into the images captured by conventional CMOS image sensors. To date, however, PSF-engineering has not been applied to neuromorphic event cameras; a powerful new image sensing technology that responds to changes in t… ▽ More Point-spread-function (PSF) engineering is a well-established computational imaging technique that uses phase masks and other optical elements to embed extra information (e.g., depth) into the images captured by conventional CMOS image sensors. To date, however, PSF-engineering has not been applied to neuromorphic event cameras; a powerful new image sensing technology that responds to changes in the log-intensity of light. This paper establishes theoretical limits (Cramér Rao bounds) on 3D point localization and tracking with PSF-engineered event cameras. Using these bounds, we first demonstrate that existing Fisher phase masks are already near-optimal for localizing static flashing point sources (e.g., blinking fluorescent molecules). We then demonstrate that existing designs are sub-optimal for tracking moving point sources and proceed to use our theory to design optimal phase masks and binary amplitude masks for this task. To overcome the non-convexity of the design problem, we leverage novel implicit neural representation based parameterizations of the phase and amplitude masks. We demonstrate the efficacy of our designs through extensive simulations. We also validate our method with a simple prototype. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.17769 [pdf, other]

Microsaccade-inspired Event Camera for Robotics

Authors: Botao He, Ze Wang, Yuan Zhou, Jingxi Chen, Chahat Deep Singh, Haojia Li, Yuman Gao, Shaojie Shen, Kaiwei Wang, Yanjun Cao, Chao Xu, Yiannis Aloimonos, Fei Gao, Cornelia Fermuller

Abstract: Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore c… ▽ More Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call Artificial MIcrosaccade-enhanced EVent camera (AMI-EV). Benchmark comparisons validate the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrate the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Published on Science Robotics June 2024 issue

arXiv:2405.16714 [pdf, other]

Crafting Interpretable Embeddings by Asking LLMs Questions

Authors: Vinamra Benara, Chandan Singh, John X. Morris, Richard Antonello, Ion Stoica, Alexander G. Huth, Jianfeng Gao

Abstract: Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb),… ▽ More Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks. However, their opaqueness and proliferation into scientific domains such as neuroscience have created a growing need for interpretability. Here, we ask whether we can obtain interpretable embeddings through LLM prompting. We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM. Training QA-Emb reduces to selecting a set of underlying questions rather than learning model weights. We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli. QA-Emb significantly outperforms an established interpretable baseline, and does so while requiring very few questions. This paves the way towards building flexible feature spaces that can concretize and evaluate our understanding of semantic brain representations. We additionally find that QA-Emb can be effectively approximated with an efficient model, and we explore broader applications in simple NLP tasks. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.00080 [pdf, other]

Recommenadation aided Caching using Combinatorial Multi-armed Bandits

Authors: Pavamana K J, Chandramani Kishore Singh

Abstract: We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increa… ▽ More We study content caching with recommendations in a wireless network where the users are connected through a base station equipped with a finite-capacity cache. We assume a fixed set of contents with unknown user preferences and content popularities. We can recommend a subset of the contents to the users which encourages the users to request these contents. Recommendation can thus be used to increase cache hits. We formulate the cache hit optimization problem as a combinatorial multi-armed bandit (CMAB). We propose a UCB-based algorithm to decide which contents to cache and recommend. We provide an upper bound on the regret of our algorithm. We numerically demonstrate the performance of our algorithm and compare it to state-of-the-art algorithms. △ Less

Submitted 3 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.16849 [pdf]

doi 10.1142/S0219477524500433

Smart Grids Secured By Dynamic Watermarking: How Secure?

Authors: Kate Davis, Laszlo B. Kish, Chanan Singh

Abstract: Unconditional security for smart grids is defined. Cryptanalyses of the watermarked security of smart grids indicate that watermarking cannot guarantee unconditional security unless the communication within the grid system is unconditionally secure. The successful attack against the dynamically watermarked smart grid remains valid even with the presence of internal noise from the grid. An open que… ▽ More Unconditional security for smart grids is defined. Cryptanalyses of the watermarked security of smart grids indicate that watermarking cannot guarantee unconditional security unless the communication within the grid system is unconditionally secure. The successful attack against the dynamically watermarked smart grid remains valid even with the presence of internal noise from the grid. An open question arises: if unconditionally authenticated secure communications within the grid, together with tamper resistance of the critical elements, are satisfactory conditions to provide unconditional security for the grid operation. △ Less

Submitted 5 March, 2024; originally announced April 2024.

Comments: Accepted for publication in Fluct. Noise Lett

arXiv:2404.12468 [pdf, other]

Fresh Caching of Dynamic Contents using Restless Multi-armed Bandits

Authors: Ankita Koley, Chandramani Singh

Abstract: We consider a dynamic content caching framework; contents are getting updated at the central server, and a subset of contents are cached at the local cache associated with a Base station (BS). When a request comes, based on whether the content is in the local cache, the BS can decide whether to fetch the content from the central server or serve the cached version from the local cache. Fetching a c… ▽ More We consider a dynamic content caching framework; contents are getting updated at the central server, and a subset of contents are cached at the local cache associated with a Base station (BS). When a request comes, based on whether the content is in the local cache, the BS can decide whether to fetch the content from the central server or serve the cached version from the local cache. Fetching a content incurs a fixed fetching cost, and serving the cached version incurs ageing cost, proportional to the age-of-version (AoV) of the content. AoV is a freshness metric that counts the number of updates at the central server since the content is being fetched. We aim to minimize the average costs (fetching cost and ageing cost) subject to cache capacity constraints. This cost minimization problem is a continuous time restless multiarmed bandit process (RMAB). The single content problem of the corresponding RMAB is a partially observable Markov decision process (POMDP) since the BS can only see the AoV of the cached contents if it fetches the content. We reformulate the POMDP as a semi-Markov decision process and provide a Whittle index based solution to this problem. Finally, we compare the performance with recent work and show that our proposed policy is optimal via simulations. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 14 pages, 7 figures

arXiv:2403.01002 [pdf, other]

Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries

Authors: Zelalem Gero, Chandan Singh, Yiqing Xie, Sheng Zhang, Tristan Naumann, Jianfeng Gao, Hoifung Poon

Abstract: Summarizing clinical text is crucial in health decision-support and clinical research. Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation, especially in safety-critical domains such as health. Holistically evaluating text summaries is challenging because they may contain unsubstantiat… ▽ More Summarizing clinical text is crucial in health decision-support and clinical research. Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation, especially in safety-critical domains such as health. Holistically evaluating text summaries is challenging because they may contain unsubstantiated information. Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process. It decomposes the evaluation process into a grounded procedure that uses an LLM for relatively simple structuring and scoring tasks, rather than the full task of holistic summary evaluation. Experiments show that AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization. Additionally, AS yields interpretations in the form of a short text span corresponding to each output, which enables efficient human auditing, paving the way towards trustworthy evaluation of clinical information in resource-constrained scenarios. We release our code, prompts, and an open-source benchmark at https://github.com/microsoft/attribute-structuring. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 4 pages

arXiv:2402.03774 [pdf, other]

Learning a Decision Tree Algorithm with Transformers

Authors: Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao

Abstract: Decision trees are renowned for their interpretability capability to achieve high predictive performance, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying the best partition is challenging, as decision trees optimized for local segments may not bring global generalization. To ad… ▽ More Decision trees are renowned for their interpretability capability to achieve high predictive performance, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying the best partition is challenging, as decision trees optimized for local segments may not bring global generalization. To address this, we introduce MetaTree, which trains a transformer-based model on filtered outputs from classical algorithms to produce strong decision trees for classification. Specifically, we fit both greedy decision trees and optimized decision trees on a large number of datasets. We then train MetaTree to produce the trees that achieve strong generalization performance. This training enables MetaTree to not only emulate these algorithms, but also to intelligently adapt its strategy according to the context, thereby achieving superior generalization performance. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.01761 [pdf, other]

Rethinking Interpretability in the Era of Large Language Models

Authors: Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao

Abstract: Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in n… ▽ More Interpretable machine learning has exploded as an area of interest over the last decade, sparked by the rise of increasingly large datasets and deep neural networks. Simultaneously, large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks, offering a chance to rethink opportunities in interpretable machine learning. Notably, the capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human. However, these new capabilities raise new challenges, such as hallucinated explanations and immense computational costs. In this position paper, we start by reviewing existing methods to evaluate the emerging field of LLM interpretation (both interpreting LLMs and using LLMs for explanation). We contend that, despite their limitations, LLMs hold the opportunity to redefine interpretability with a more ambitious scope across many applications, including in auditing LLMs themselves. We highlight two emerging research priorities for LLM interpretation: using LLMs to directly analyze new datasets and to generate interactive explanations. △ Less

Submitted 30 January, 2024; originally announced February 2024.

Comments: 7 pages

arXiv:2401.13986 [pdf, other]

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Authors: Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao

Abstract: Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent ac… ▽ More Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning (+4.5% relative). Code is available at https://github.com/yandachen/explanation-consistency-finetuning . △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: arXiv admin note: text overlap with arXiv:2307.08678

arXiv:2401.01001 [pdf]

Metalearning-Informed Competence in Children: Implications for Responsible Brain-Inspired Artificial Intelligence

Authors: Chaitanya Singh

Abstract: This paper offers a novel conceptual framework comprising four essential cognitive mechanisms that operate concurrently and collaboratively to enable metalearning (knowledge and regulation of learning) strategy implementation in young children. A roadmap incorporating the core mechanisms and the associated strategies is presented as an explanation of the developing brain's remarkable cross-context… ▽ More This paper offers a novel conceptual framework comprising four essential cognitive mechanisms that operate concurrently and collaboratively to enable metalearning (knowledge and regulation of learning) strategy implementation in young children. A roadmap incorporating the core mechanisms and the associated strategies is presented as an explanation of the developing brain's remarkable cross-context learning competence. The tetrad of fundamental complementary processes is chosen to collectively represent the bare-bones metalearning architecture that can be extended to artificial intelligence (AI) systems emulating brain-like learning and problem-solving skills. Utilizing the metalearning-enabled young mind as a model for brain-inspired computing, this work further discusses important implications for morally grounded AI. △ Less

Submitted 5 September, 2023; originally announced January 2024.

Comments: 27 pages, 3 figures

arXiv:2311.04579 [pdf]

Text Finder Application for Android

Authors: Dr. Milind Godase, Dr. Chandrani Singh, Kunal Dhongadi

Abstract: A Text Finder, an android application that utilizes Optical Character Recognition (OCR) technology with the help of Google Cloud Vision API to extract text from images taken with the device camera or from existing images in the users phone. The extracted text can be saved to the device storage where all previous extracts can be easily accessed on a user-friendly interface. The application also fea… ▽ More A Text Finder, an android application that utilizes Optical Character Recognition (OCR) technology with the help of Google Cloud Vision API to extract text from images taken with the device camera or from existing images in the users phone. The extracted text can be saved to the device storage where all previous extracts can be easily accessed on a user-friendly interface. The application also features editing, deletion and sharing options for the extracted text. The user interface is user-friendly, making the application accessible to students, professional and organizations for a variety of purposes, including document scanning, data entry, and information retrieval. Manual extraction of text by typing or writing from images can be very time-consuming and can be prone to errors. This application is an efficient and simple solution for extracted texts and organizing important information from the photos. This paper describes the technical details of the OCR technology and Googles ML Kit Text Recognition API used in the application, as well as the design, implementation and evaluation of the application in terms of performance and accuracy. The research also explores the key objectives and benefits of Text Finder, such as reducing the time and effort required and increasing the efficiency of document-based tasks. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 9 pages

MSC Class: sinhgad.org ACM Class: I.2.7

arXiv:2311.02262 [pdf, other]

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

Authors: Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

Abstract: In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers. These textual emphases are vital for the readers to grasp the conveyed information. When interacting with large language models (LLMs), we have a similar need - steering the model to pay closer attention to user-specified information, e.g., an instruction. Existin… ▽ More In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers. These textual emphases are vital for the readers to grasp the conveyed information. When interacting with large language models (LLMs), we have a similar need - steering the model to pay closer attention to user-specified information, e.g., an instruction. Existing methods, however, are constrained to process plain text and do not support such a mechanism. This motivates us to introduce PASTA - Post-hoc Attention STeering Approach, a method that allows LLMs to read text with user-specified emphasis marks. To this end, PASTA identifies a small subset of attention heads and applies precise attention reweighting on them, directing the model attention to user-specified parts. Like prompting, PASTA is applied at inference time and does not require changing any model parameters. Experiments demonstrate that PASTA can substantially enhance an LLM's ability to follow user instructions or integrate new knowledge from user inputs, leading to a significant performance improvement on a variety of tasks, e.g., an average accuracy improvement of 22% for LLAMA-7B. Our code is publicly available at https://github.com/QingruZhang/PASTA . △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: 16 pages

arXiv:2310.14034 [pdf, other]

Tree Prompting: Efficient Task Adaptation without Fine-Tuning

Authors: John X. Morris, Chandan Singh, Alexander M. Rush, Jianfeng Gao, Yuntian Deng

Abstract: Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based finetuning. Tree Prompting is an approach to prompting which builds a decision tree of prompts, linking multiple LM calls together to solve a task. At inference time, each call to the LM is determined by efficiently routing the o… ▽ More Prompting language models (LMs) is the main interface for applying them to new tasks. However, for smaller LMs, prompting provides low accuracy compared to gradient-based finetuning. Tree Prompting is an approach to prompting which builds a decision tree of prompts, linking multiple LM calls together to solve a task. At inference time, each call to the LM is determined by efficiently routing the outcome of the previous call using the tree. Experiments on classification datasets show that Tree Prompting improves accuracy over competing methods and is competitive with fine-tuning. We also show that variants of Tree Prompting allow inspection of a model's decision-making process. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: Both first authors contributed equally; accepted to EMNLP 2023

arXiv:2310.08745 [pdf, other]

AcTExplore: Active Tactile Exploration of Unknown Objects

Authors: Amir-Hossein Shahidzadeh, Seong Jong Yoo, Pavan Mantripragada, Chahat Deep Singh, Cornelia Fermüller, Yiannis Aloimonos

Abstract: Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, primarily due to the large-scale unknown environments and limited sensing coverage of these sensors. To this end, we present AcTExplore, an active tactile exploration method dr… ▽ More Tactile exploration plays a crucial role in understanding object structures for fundamental robotics tasks such as grasping and manipulation. However, efficiently exploring such objects using tactile sensors is challenging, primarily due to the large-scale unknown environments and limited sensing coverage of these sensors. To this end, we present AcTExplore, an active tactile exploration method driven by reinforcement learning for object reconstruction at scales that automatically explores the object surfaces in a limited number of steps. Through sufficient exploration, our algorithm incrementally collects tactile data and reconstructs 3D shapes of the objects as well, which can serve as a representation for higher-level downstream tasks. Our method achieves an average of 95.97% IoU coverage on unseen YCB objects while just being trained on primitive shapes. Project Webpage: https://prg.cs.umd.edu/AcTExplore △ Less

Submitted 20 June, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

Comments: 8 pages, 6 figures, Accepted to ICRA 2024

arXiv:2310.07480 [pdf, other]

$μ$TAS: Design and implementation of Time Aware Shaper on SmartNICs to achieve bounded latency

Authors: Joydeep Pal, Deepak Choudhary, Nithish Krishnabharathi Gnani, Chandramani Singh, T. V. Prabhakar

Abstract: Time-Aware Shaper (TAS) is a time-triggered scheduling mechanism that ensures bounded latency for time-critical Scheduled Traffic (ST) flows. The Linux kernel implementation (a.k.a TAPRIO) has limited capabilities due to varying CPU workloads and thus does not offer tight latency bound for the ST flows. Also, currently only higher cycle times are possible. Other software implementations are limite… ▽ More Time-Aware Shaper (TAS) is a time-triggered scheduling mechanism that ensures bounded latency for time-critical Scheduled Traffic (ST) flows. The Linux kernel implementation (a.k.a TAPRIO) has limited capabilities due to varying CPU workloads and thus does not offer tight latency bound for the ST flows. Also, currently only higher cycle times are possible. Other software implementations are limited to simulation studies without physical implementation. In this paper, we present $μ$TAS, a MicroC-based hardware implementation of TAS onto a programmable SmartNIC. $μ$TAS takes advantage of the parallel-processing architecture of the SmartNIC to configure the scheduling behaviour of its queues at runtime. To demonstrate the effectiveness of $μ$TAS, we built a Time-Sensitive Networking (TSN) testbed from scratch. This consists of multiple end-hosts capable of generating ST and Best Effort (BE) flows and TSN switches equipped with SmartNICs running $μ$TAS. Time synchronization is maintained between the switches and hosts. Our experiments demonstrate that the ST flows experience a bounded latency of the order of tens of microseconds. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 6 pages, 9 figures

arXiv:2309.10383 [pdf, other]

EdgeP4: A P4-Programmable Edge Intelligent Ethernet Switch for Tactile Cyber-Physical Systems

Authors: Nithish Krishnabharathi Gnani, Joydeep Pal, Deepak Choudhary, Himanshu Verma, Soumya Kanta Rana, Kaushal Mhapsekar, T. V. Prabhakar, Chandramani Singh

Abstract: Tactile Internet based operations, e.g., telesurgery, rely on end-to-end closed loop control for accuracy and corrections. The feedback and control are subject to network latency and loss. We design two edge intelligence algorithms hosted at P4 programmable end switches. These algorithms locally compute and command corrective signals, thereby dispense the feedback signals from traversing the netwo… ▽ More Tactile Internet based operations, e.g., telesurgery, rely on end-to-end closed loop control for accuracy and corrections. The feedback and control are subject to network latency and loss. We design two edge intelligence algorithms hosted at P4 programmable end switches. These algorithms locally compute and command corrective signals, thereby dispense the feedback signals from traversing the network to the other ends and save on control loop latency and network load. We implement these algorithms entirely on data plane on Netronome Agilio SmartNICs using P4. Our first algorithm, $\textit{pose correction}$, is placed at the edge switch connected to an industrial robot gripping a tool. The round trip between transmitting force sensor array readings to the edge switch and receiving correct tip coordinates at the robot is shown to be less than $100~μs$. The second algorithm, $\textit{tremor suppression}$, is placed at the edge switch connected to the human operator. It suppresses physiological tremors of amplitudes smaller than $100~μm$ which not only improves the application's performance but also reduces the network load up to $99.9\%$. Our solution allows edge intelligence modules to seamlessly switch between the algorithms based on the tasks being executed at the end hosts. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2306.00024 [pdf, other]

Self-Verification Improves Few-Shot Clinical Information Extraction

Authors: Zelalem Gero, Chandan Singh, Hao Cheng, Tristan Naumann, Michel Galley, Jianfeng Gao, Hoifung Poon

Abstract: Extracting patient information from unstructured text is a critical task in health decision-support and clinical research. Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning, in contrast to supervised learning which requires much more costly human annotations. However, despite drastic advances in modern LLMs such as GPT-4, they st… ▽ More Extracting patient information from unstructured text is a critical task in health decision-support and clinical research. Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning, in contrast to supervised learning which requires much more costly human annotations. However, despite drastic advances in modern LLMs such as GPT-4, they still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs. This is made possible by the asymmetry between verification and generation, where the latter is often much easier than the former. Experimental results show that our method consistently improves accuracy for various LLMs in standard clinical information extraction tasks. Additionally, self-verification yields interpretations in the form of a short text span corresponding to each output, which makes it very efficient for human experts to audit the results, paving the way towards trustworthy extraction of clinical information in resource-constrained scenarios. To facilitate future research in this direction, we release our code and prompts. △ Less

Submitted 30 May, 2023; originally announced June 2023.

Journal ref: IMLH 2023

arXiv:2305.09863 [pdf, other]

Explaining black box text modules in natural language with language models

Authors: Chandan Singh, Aliyah R. Hsu, Richard Antonello, Shailee Jain, Alexander G. Huth, Bin Yu, Jianfeng Gao

Abstract: Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A "text module" is any function that maps text to a scalar continuous v… ▽ More Large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their rapid proliferation and increasing opaqueness have created a growing need for interpretability. Here, we ask whether we can automatically obtain natural language explanations for black box text modules. A "text module" is any function that maps text to a scalar continuous value, such as a submodule within an LLM or a fitted model of a brain region. "Black box" indicates that we only have access to the module's inputs/outputs. We introduce Summarize and Score (SASC), a method that takes in a text module and returns a natural language explanation of the module's selectivity along with a score for how reliable the explanation is. We study SASC in 3 contexts. First, we evaluate SASC on synthetic modules and find that it often recovers ground truth explanations. Second, we use SASC to explain modules found within a pre-trained BERT model, enabling inspection of the model's internals. Finally, we show that SASC can generate explanations for the response of individual fMRI voxels to language stimuli, with potential applications to fine-grained brain mapping. All code for using SASC and reproducing results is made available on Github. △ Less

Submitted 15 November, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2304.12227 [pdf, ps, other]

Caching Contents with Varying Popularity using Restless Bandits

Authors: Pavamana K J, Chandramani Singh

Abstract: We study content caching in a wireless network in which the users are connected through a base station that is equipped with a finite-capacity cache. We assume a fixed set of contents whose popularity varies with time. Users' requests for the content depend on their instantaneous popularity levels. Proactively caching contents at the base station incurs a cost but not having requested contents at… ▽ More We study content caching in a wireless network in which the users are connected through a base station that is equipped with a finite-capacity cache. We assume a fixed set of contents whose popularity varies with time. Users' requests for the content depend on their instantaneous popularity levels. Proactively caching contents at the base station incurs a cost but not having requested contents at the base station also incurs a cost. We propose to proactively cache contents at the base station so as to minimize content missing and caching costs. We formulate the problem as a discounted cost Markov decision problem that is a restless multi-armed bandit problem. We provide conditions under which the problem is indexable and also propose a novel approach to maneuver a few parameters to render the problem indexable. We demonstrate the efficacy of the Whittle index policy via numerical evaluation. △ Less

Submitted 17 September, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2212.03291

arXiv:2304.05934 [pdf, other]

ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition

Authors: Aashaka Desai, Lauren Berger, Fyodor O. Minakov, Vanessa Milan, Chinmay Singh, Kriston Pumphrey, Richard E. Ladner, Hal Daumé III, Alex X. Lu, Naomi Caselli, Danielle Bragg

Abstract: Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos for 2,73… ▽ More Sign languages are used as a primary language by approximately 70 million D/deaf people world-wide. However, most communication technologies operate in spoken and written languages, creating inequities in access. To help tackle this problem, we release ASL Citizen, the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos for 2,731 distinct signs filmed by 52 signers in a variety of environments. We propose that this dataset be used for sign language dictionary retrieval for American Sign Language (ASL), where a user demonstrates a sign to their webcam to retrieve matching signs from a dictionary. We show that training supervised machine learning classifiers with our dataset advances the state-of-the-art on metrics relevant for dictionary retrieval, achieving 63% accuracy and a recall-at-10 of 91%, evaluated entirely on videos of users who are not present in the training or validation sets. An accessible PDF of this article is available at the following link: https://aashakadesai.github.io/research/ASLCitizen_arxiv_updated.pdf △ Less

Submitted 19 June, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

arXiv:2302.13054 [pdf, other]

Charting mobility patterns in the scientific knowledge landscape

Authors: Chakresh Kumar Singh, Liubov Tupikina, Fabrice Lécuyer, Michele Starnini, Marc Santolini

Abstract: From small steps to great leaps, metaphors of spatial mobility abound to describe discovery processes. Here, we ground these ideas in formal terms by systematically studying scientific knowledge mobility patterns. We use low-dimensional embedding techniques to create a knowledge space made up of 1.5 million articles from the fields of physics, computer science, and mathematics. By analyzing the pu… ▽ More From small steps to great leaps, metaphors of spatial mobility abound to describe discovery processes. Here, we ground these ideas in formal terms by systematically studying scientific knowledge mobility patterns. We use low-dimensional embedding techniques to create a knowledge space made up of 1.5 million articles from the fields of physics, computer science, and mathematics. By analyzing the publication histories of individual researchers, we discover patterns of knowledge mobility that closely resemble physical mobility. In aggregate, the trajectories form mobility flows that can be described by a gravity model, with jumps more likely to occur in areas of high density and less likely to occur over longer distances. We identify two types of researchers from their individual mobility patterns: interdisciplinary explorers who pioneer new fields, and exploiters who are more likely to stay within their specific areas of expertise. Our results suggest that spatial mobility analysis is a valuable tool for understanding knowledge evolution. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: 15 pages, 5 figures, 10 Supplementary Figures

arXiv:2212.14189 [pdf, other]

High Resolution Modeling and Analysis of Cryptocurrency Mining's Impact on Power Grids: Carbon Footprint, Reliability, and Electricity Price

Authors: Ali Menati, Xiangtian Zheng, Kiyeob Lee, Ranyu Shi, Pengwei Du, Chanan Singh, Le Xie

Abstract: Blockchain technologies are considered one of the most disruptive innovations of the last decade, enabling secure decentralized trust-building. However, in recent years, with the rapid increase in the energy consumption of blockchain-based computations for cryptocurrency mining, there have been growing concerns about their sustainable operation in electric grids. This paper investigates the tri-fa… ▽ More Blockchain technologies are considered one of the most disruptive innovations of the last decade, enabling secure decentralized trust-building. However, in recent years, with the rapid increase in the energy consumption of blockchain-based computations for cryptocurrency mining, there have been growing concerns about their sustainable operation in electric grids. This paper investigates the tri-factor impact of such large loads on carbon footprint, grid reliability, and electricity market price in the Texas grid. We release open-source high-resolution data to enable high-resolution modeling of influencing factors such as location and flexibility. We reveal that the per-megawatt-hour carbon footprint of cryptocurrency mining loads across locations can vary by as much as 50% of the crude system average estimate. We show that the flexibility of mining loads can significantly mitigate power shortages and market disruptions that can result from the deployment of mining loads. These findings suggest policymakers to facilitate the participation of large mining facilities in wholesale markets and require them to provide mandatory demand response. △ Less

Submitted 14 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: This paper has been accepted for publication in the journal of "Advances in Applied Energy"

arXiv:2212.03291

Caching Contents with Varying Popularity using Restless Bandits

Authors: Pavamana K J, Chandramani Kishore Singh

Abstract: Mobile networks are experiencing prodigious increase in data volume and user density , which exerts a great burden on mobile core networks and backhaul links. An efficient technique to lessen this problem is to use caching i.e. to bring the data closer to the users by making use of the caches of edge network nodes, such as fixed or mobile access points and even user devices. The performance of a c… ▽ More Mobile networks are experiencing prodigious increase in data volume and user density , which exerts a great burden on mobile core networks and backhaul links. An efficient technique to lessen this problem is to use caching i.e. to bring the data closer to the users by making use of the caches of edge network nodes, such as fixed or mobile access points and even user devices. The performance of a caching depends on contents that are cached. In this paper, we examine the problem of content caching at the wireless edge(i.e. base stations) to minimize the discounted cost incurred over infinite horizon. We formulate this problem as a restless bandit problem, which is hard to solve. We begin by showing an optimal policy is of threshold type. Using these structural results, we prove the indexability of the problem, and use Whittle index policy to minimize the discounted cost. △ Less

Submitted 20 June, 2023; v1 submitted 31 October, 2022; originally announced December 2022.

Comments: There were a mistakes while submitting updated version. I have submitted a fresh new submissions arXiv:2304.12227

arXiv:2210.01848 [pdf, other]

Explaining Patterns in Data with Language Models via Interpretable Autoprompting

Authors: Chandan Singh, John X. Morris, Jyoti Aneja, Alexander M. Rush, Jianfeng Gao

Abstract: Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trained LLM and data examples, we introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explainin… ▽ More Large language models (LLMs) have displayed an impressive ability to harness natural language to perform complex tasks. In this work, we explore whether we can leverage this learned ability to find and explain patterns in data. Specifically, given a pre-trained LLM and data examples, we introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data. iPrompt iteratively alternates between generating explanations with an LLM and reranking them based on their performance when used as a prompt. Experiments on a wide range of datasets, from synthetic mathematics to natural-language understanding, show that iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions. Moreover, the prompts produced by iPrompt are simultaneously human-interpretable and highly effective for generalization: on real-world sentiment classification datasets, iPrompt produces prompts that match or even improve upon human-written prompts for GPT-3. Finally, experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery. All code for using the methods and data here is made available on Github. △ Less

Submitted 26 January, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: The two first authors contributed equally

arXiv:2210.00715 [pdf, other]

WorldGen: A Large Scale Generative Simulator

Authors: Chahat Deep Singh, Riya Kumari, Cornelia Fermüller, Nitin J. Sanket, Yiannis Aloimonos

Abstract: In the era of deep learning, data is the critical determining factor in the performance of neural network models. Generating large datasets suffers from various difficulties such as scalability, cost efficiency and photorealism. To avoid expensive and strenuous dataset collection and annotations, researchers have inclined towards computer-generated datasets. Although, a lack of photorealism and a… ▽ More In the era of deep learning, data is the critical determining factor in the performance of neural network models. Generating large datasets suffers from various difficulties such as scalability, cost efficiency and photorealism. To avoid expensive and strenuous dataset collection and annotations, researchers have inclined towards computer-generated datasets. Although, a lack of photorealism and a limited amount of computer-aided data, has bounded the accuracy of network predictions. To this end, we present WorldGen -- an open source framework to autonomously generate countless structured and unstructured 3D photorealistic scenes such as city view, object collection, and object fragmentation along with its rich ground truth annotation data. WorldGen being a generative model gives the user full access and control to features such as texture, object structure, motion, camera and lens properties for better generalizability by diminishing the data bias in the network. We demonstrate the effectiveness of WorldGen by presenting an evaluation on deep optical flow. We hope such a tool can open doors for future research in a myriad of domains related to robotics and computer vision by reducing manual labor and the cost of acquiring rich and high-quality data. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Journal ref: Under review in ICRA 2023

arXiv:2209.11799 [pdf, other]

doi 10.1038/s41467-023-43713-1

Augmenting Interpretable Models with LLMs during Training

Authors: Chandan Singh, Armin Askari, Rich Caruana, Jianfeng Gao

Abstract: Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowl… ▽ More Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1,000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented counterparts. Aug-GAM can even outperform much larger models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data. All code for using Aug-imodels and reproducing results is made available on Github. △ Less

Submitted 24 April, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Journal ref: Nature Communications, 2023

arXiv:2209.10944 [pdf, other]

Learning Invariant Representations for Equivariant Neural Networks Using Orthogonal Moments

Authors: Jaspreet Singh, Chandan Singh

Abstract: The convolutional layers of standard convolutional neural networks (CNNs) are equivariant to translation. However, the convolution and fully-connected layers are not equivariant or invariant to other affine geometric transformations. Recently, a new class of CNNs is proposed in which the conventional layers of CNNs are replaced with equivariant convolution, pooling, and batch-normalization layers.… ▽ More The convolutional layers of standard convolutional neural networks (CNNs) are equivariant to translation. However, the convolution and fully-connected layers are not equivariant or invariant to other affine geometric transformations. Recently, a new class of CNNs is proposed in which the conventional layers of CNNs are replaced with equivariant convolution, pooling, and batch-normalization layers. The final classification layer in equivariant neural networks is invariant to different affine geometric transformations such as rotation, reflection and translation, and the scalar value is obtained by either eliminating the spatial dimensions of filter responses using convolution and down-sampling throughout the network or average is taken over the filter responses. In this work, we propose to integrate the orthogonal moments which gives the high-order statistics of the function as an effective means for encoding global invariance with respect to rotation, reflection and translation in fully-connected layers. As a result, the intermediate layers of the network become equivariant while the classification layer becomes invariant. The most widely used Zernike, pseudo-Zernike and orthogonal Fourier-Mellin moments are considered for this purpose. The effectiveness of the proposed work is evaluated by integrating the invariant transition and fully-connected layer in the architecture of group-equivariant CNNs (G-CNNs) on rotated MNIST and CIFAR10 datasets. △ Less

Submitted 22 September, 2022; originally announced September 2022.

Comments: International Joint Conference on Neural Networks (IJCNN), 2022

arXiv:2208.14765 [pdf, other]

Recent Advances in Modeling and Control of Epidemics using a Mean Field Approach

Authors: Amal Roy, Chandramani Singh, Y. Narahari

Abstract: Modeling and control of epidemics such as the novel Corona virus have assumed paramount importance at a global level. A natural and powerful dynamical modeling framework to use in this context is a continuous time Markov decision process (CTMDP) that encompasses classical compartmental paradigms such as the Susceptible-Infected-Recovered (SIR) model. The challenges with CTMDP based models motivate… ▽ More Modeling and control of epidemics such as the novel Corona virus have assumed paramount importance at a global level. A natural and powerful dynamical modeling framework to use in this context is a continuous time Markov decision process (CTMDP) that encompasses classical compartmental paradigms such as the Susceptible-Infected-Recovered (SIR) model. The challenges with CTMDP based models motivate the need for a more efficient approach and the mean field approach offers an effective alternative. The mean field approach computes the collective behavior of a dynamical system comprising numerous interacting nodes (where nodes represent individuals in the population). This paper (a) presents an overview of the mean field approach to epidemic modeling and control and (b) provides a state-of-the-art update on recent advances on this topic. Our discussion in this paper proceeds along two specific threads. The first thread assumes that the individual nodes faithfully follow a socially optimal control policy prescribed by a regulatory authority. The second thread allows the individual nodes to exhibit independent, strategic behavior. In this case, the strategic interaction is modeled as a mean field game and the control is based on the associated mean field Nash equilibria. In this paper, we start with a discussion of modeling of epidemics using an extended compartmental model - SIVR and provide an illustrative example. We next provide a review of relevant literature, using a mean field approach, on optimal control of epidemics, dealing with how a regulatory authority may optimally contain epidemic spread in a population. Following this, we provide an update on the literature on the use of the mean field game based approach in the study of epidemic spread and control. We conclude the paper with relevant future research directions. △ Less

Submitted 12 April, 2023; v1 submitted 31 August, 2022; originally announced August 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2205.15135 [pdf, other]

Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data

Authors: Keyan Nasseri, Chandan Singh, James Duncan, Aaron Kornblith, Bin Yu

Abstract: Machine learning in high-stakes domains, such as healthcare, faces two critical challenges: (1) generalizing to diverse data distributions given limited training data while (2) maintaining interpretability. To address these challenges, we propose an instance-weighted tree-sum method that effectively pools data across diverse groups to output a concise, rule-based model. Given distinct groups of in… ▽ More Machine learning in high-stakes domains, such as healthcare, faces two critical challenges: (1) generalizing to diverse data distributions given limited training data while (2) maintaining interpretability. To address these challenges, we propose an instance-weighted tree-sum method that effectively pools data across diverse groups to output a concise, rule-based model. Given distinct groups of instances in a dataset (e.g., medical patients grouped by age or treatment site), our method first estimates group membership probabilities for each instance. Then, it uses these estimates as instance weights in FIGS (Tan et al. 2022), to grow a set of decision trees whose values sum to the final prediction. We call this new method Group Probability-Weighted Tree Sums (G-FIGS). G-FIGS achieves state-of-the-art prediction performance on important clinical datasets; e.g., holding the level of sensitivity fixed at 92%, G-FIGS increases specificity for identifying cervical spine injury by up to 10% over CART and up to 3% over FIGS alone, with larger gains at higher sensitivity levels. By keeping the total number of rules below 16 in FIGS, the final models remain interpretable, and we find that their rules match medical domain expertise. All code, data, and models are released on Github. △ Less

Submitted 30 May, 2022; originally announced May 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2201.11931

arXiv:2205.14792 [pdf]

End-to-End Topology-Aware Machine Learning for Power System Reliability Assessment

Authors: Yongli Zhu, Chanan Singh

Abstract: Conventional power system reliability suffers from the long run time of Monte Carlo simulation and the dimension-curse of analytic enumeration methods. This paper proposes a preliminary investigation on end-to-end machine learning for directly predicting the reliability index, e.g., the Loss of Load Probability (LOLP). By encoding the system admittance matrix into the input feature, the proposed m… ▽ More Conventional power system reliability suffers from the long run time of Monte Carlo simulation and the dimension-curse of analytic enumeration methods. This paper proposes a preliminary investigation on end-to-end machine learning for directly predicting the reliability index, e.g., the Loss of Load Probability (LOLP). By encoding the system admittance matrix into the input feature, the proposed machine learning pipeline can consider the impact of specific topology changes due to regular maintenances of transmission lines. Two models (Support Vector Machine and Boosting Trees) are trained and compared. Details regarding the training data creation and preprocessing are also discussed. Finally, experiments are conducted on the IEEE RTS-79 system. Results demonstrate the applicability of the proposed end-to-end machine learning pipeline in reliability assessment. △ Less

Submitted 29 May, 2022; originally announced May 2022.

Comments: This paper has been accepted by PMAPS 2022 and will be officially presented on 14 June 2022

arXiv:2202.00858 [pdf, other]

Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods

Authors: Abhineet Agarwal, Yan Shuo Tan, Omer Ronen, Chandan Singh, Bin Yu

Abstract: Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice. To mitigate overfitting, trees are typically regularized by a variety of techniques that modify their structure (e.g. pruning). We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure, and instead regularizes the tree by shrinking th… ▽ More Tree-based models such as decision trees and random forests (RF) are a cornerstone of modern machine-learning practice. To mitigate overfitting, trees are typically regularized by a variety of techniques that modify their structure (e.g. pruning). We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure, and instead regularizes the tree by shrinking the prediction over each node towards the sample means of its ancestors. The amount of shrinkage is controlled by a single regularization parameter and the number of data points in each ancestor. Since HS is a post-hoc method, it is extremely fast, compatible with any tree growing algorithm, and can be used synergistically with other regularization techniques. Extensive experiments over a wide variety of real-world datasets show that HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques. Moreover, we find that applying HS to each tree in an RF often improves accuracy, as well as its interpretability by simplifying and stabilizing its decision boundaries and SHAP values. We further explain the success of HS in improving prediction performance by showing its equivalence to ridge regression on a (supervised) basis constructed of decision stumps associated with the internal nodes of a tree. All code and models are released in a full-fledged package available on Github (github.com/csinva/imodels) △ Less

Submitted 1 February, 2022; originally announced February 2022.

arXiv:2201.11931 [pdf, other]

Fast Interpretable Greedy-Tree Sums

Authors: Yan Shuo Tan, Chandan Singh, Keyan Nasseri, Abhineet Agarwal, James Duncan, Omer Ronen, Matthew Epland, Aaron Kornblith, Bin Yu

Abstract: Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FI… ▽ More Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the CART algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS is able to adapt to additive structure while remaining highly interpretable. Extensive experiments on real-world datasets show that FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding clinical decision-making. Specifically, we introduce a variant of FIGS known as G-FIGS that accounts for the heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. To provide further insight into FIGS, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that unconstrained tree-sum models leverage disentanglement to generalize more efficiently than single decision tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS enjoys competitive performance with random forests and XGBoost on real-world datasets. △ Less

Submitted 8 July, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

arXiv:2201.09050 [pdf, other]

Scheduling Policies for Stability and Optimal Server Running Cost in Cloud Computing Platforms

Authors: Haritha K, Chandramani Singh

Abstract: We propose throughput and cost optimal job scheduling algorithms in cloud computing platforms offering Infrastructure as a Service. We first consider online migration and propose job scheduling algorithms to minimize job migration and server running costs. We consider algorithms that assume knowledge of job-size on arrival of jobs. We characterize the optimal cost subject to system stability. We d… ▽ More We propose throughput and cost optimal job scheduling algorithms in cloud computing platforms offering Infrastructure as a Service. We first consider online migration and propose job scheduling algorithms to minimize job migration and server running costs. We consider algorithms that assume knowledge of job-size on arrival of jobs. We characterize the optimal cost subject to system stability. We develop a drift-plus-penalty framework based algorithm that can achieve optimal cost arbitrarily closely. Specifically this algorithm yields a trade-off between delay and costs. We then relax the job-size knowledge assumption and give an algorithm that uses readily offered service to the jobs. We show that this algorithm gives order-wise identical cost as the job size based algorithm. Later, we consider offline job migration that incurs migration delays. We again present throughput optimal algorithms that minimize server running cost. We illustrate the performance of the proposed algorithms and compare these to the existing algorithms via simulation. △ Less

Submitted 5 June, 2022; v1 submitted 22 January, 2022; originally announced January 2022.

arXiv:2112.02721 [pdf, other]

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter). △ Less

Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

arXiv:2111.09275 [pdf, other]

doi 10.1109/EICT54103.2021.9733695

Sentiment Analysis of Microblogging dataset on Coronavirus Pandemic

Authors: Nosin Ibna Mahbub, Md Rakibul Islam, Md Al Amin, Md Khairul Islam, Bikash Chandra Singh, Md Imran Hossain Showrov, Anirudda Sarkar

Abstract: Sentiment analysis can largely influence the people to get the update of the current situation. Coronavirus (COVID-19) is a contagious illness caused by the coronavirus 2 that causes severe respiratory symptoms. The lives of millions have continued to be affected by this pandemic, several countries have resorted to a full lockdown. During this lockdown, people have taken social networks to express… ▽ More Sentiment analysis can largely influence the people to get the update of the current situation. Coronavirus (COVID-19) is a contagious illness caused by the coronavirus 2 that causes severe respiratory symptoms. The lives of millions have continued to be affected by this pandemic, several countries have resorted to a full lockdown. During this lockdown, people have taken social networks to express their emotions to find a way to calm themselves down. People are spreading their sentiments through microblogging websites as one of the most preventive steps of this disease is the socialization to gain people's awareness to stay home and keep their distance when they are outside home. Twitter is a popular online social media platform for exchanging ideas. People can post their different sentiments, which can be used to aware people. But, some people want to spread fake news to frighten the people. So, it is necessary to identify the positive, negative, and neutral thoughts so that the positive opinions can be delivered to the mass people for spreading awareness to the people. Moreover, a huge volume of data is floating on Twitter. So, it is also important to identify the context of the dataset. In this paper, we have analyzed the Twitter dataset for evaluating the sentiment using several machine learning algorithms. Later, we have found out the context learning of the dataset based on the sentiments. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: 7 pages, 5 figures, 5th IEEE International Conference on Electrical Information and Communication Technology (EICT)

MSC Class: 68Uxx ACM Class: I.7

Journal ref: 2021 5th International Conference on Electrical Information and Communication Technology (EICT)

arXiv:2111.04783 [pdf, other]

Capacity and Performance Analysis of RIS-Assisted Communication Over Rician Fading Channels

Authors: Chandradeep Singh, Chia-Hsiang Lin, Kamal Singh

Abstract: This paper investigates two performance metrics, namely ergodic capacity and symbol error rate, of mmWave communication system assisted by a reconfigurable intelligent surface (RIS). We assume independent and identically distributed (i.i.d.) Rician fadings between user-RIS-Access Point (AP), with RIS surface consisting of passive reflecting elements. First, we derive a new unified closed-form form… ▽ More This paper investigates two performance metrics, namely ergodic capacity and symbol error rate, of mmWave communication system assisted by a reconfigurable intelligent surface (RIS). We assume independent and identically distributed (i.i.d.) Rician fadings between user-RIS-Access Point (AP), with RIS surface consisting of passive reflecting elements. First, we derive a new unified closed-form formula for the average symbol error probability of generalised M-QAM/M-PSK signalling over this mmWave link. We then obtain new closed-form expressions for the ergodic capacity with and without channel state information (CSI) at the AP. △ Less

Submitted 8 November, 2021; originally announced November 2021.

arXiv:2109.13859 [pdf, other]

NudgeSeg: Zero-Shot Object Segmentation by Repeated Physical Interaction

Authors: Chahat Deep Singh, Nitin J. Sanket, Chethan M. Parameshwara, Cornelia Fermüller, Yiannis Aloimonos

Abstract: Recent advances in object segmentation have demonstrated that deep neural networks excel at object segmentation for specific classes in color and depth images. However, their performance is dictated by the number of classes and objects used for training, thereby hindering generalization to never seen objects or zero-shot samples. To exacerbate the problem further, object segmentation using image f… ▽ More Recent advances in object segmentation have demonstrated that deep neural networks excel at object segmentation for specific classes in color and depth images. However, their performance is dictated by the number of classes and objects used for training, thereby hindering generalization to never seen objects or zero-shot samples. To exacerbate the problem further, object segmentation using image frames rely on recognition and pattern matching cues. Instead, we utilize the 'active' nature of a robot and their ability to 'interact' with the environment to induce additional geometric constraints for segmenting zero-shot samples. In this paper, we present the first framework to segment unknown objects in a cluttered scene by repeatedly 'nudging' at the objects and moving them to obtain additional motion cues at every step using only a monochrome monocular camera. We call our framework NudgeSeg. These motion cues are used to refine the segmentation masks. We successfully test our approach to segment novel objects in various cluttered scenes and provide an extensive study with image and motion segmentation methods. We show an impressive average detection rate of over 86% on zero-shot objects. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: 8 Pages, 7 Figures, 3 Tables

Journal ref: IEEE International Conference on Robots and Systems (IROS) 2021

arXiv:2109.01817 [pdf, other]

Low SNR Capacity of Keyhole MIMO Channel in Nakagami-m Fading With Full CSI

Authors: Kamal Singh, Chandradeep Singh, Kuang-Hao Liu

Abstract: In this paper, we obtain asymptotic expressions for the ergodic capacity of the keyhole multiple-input multiple-output (MIMO) channel at low signal-to-noise ratio (SNR) in independent and identically distributed Nakagami-$m$ fading conditions with perfect channel state information at the transmitter and receiver. We show that the low-SNR capacity of this keyhole MIMO channel scales proportionally… ▽ More In this paper, we obtain asymptotic expressions for the ergodic capacity of the keyhole multiple-input multiple-output (MIMO) channel at low signal-to-noise ratio (SNR) in independent and identically distributed Nakagami-$m$ fading conditions with perfect channel state information at the transmitter and receiver. We show that the low-SNR capacity of this keyhole MIMO channel scales proportionally as $\frac{\textrm{SNR}}{4} \log^2 \left(1/{\textrm{SNR}}\right)$. Our main contribution is to identify a surprising result that the low-SNR capacity of the MIMO fading channel increases in the presence of keyhole degenerate condition, which is in direct contrast to the well-known MIMO capacity degradation at high SNR under keyhole conditions. To explain why rank-deficient keyhole fading channel outperforms the full-rank MIMO fading channel at sufficiently low-SNR, we remark that the rank of the MIMO channel matrix has no impact in the low-SNR regime and that the double-faded (or double-scattering) nature of the keyhole MIMO channel creates more opportunistic communications at low-SNR when compared with pure MIMO fading channel which leads to increased capacity. Finally, we also show that a simple one-bit channel information based on-off power control achieves this low-SNR capacity; surprisingly, this power adaptation is robust against both moderate and severe fading for a wide range of low SNR values. These results also hold for the keyhole MIMO Rayleigh channel as a special case. △ Less

Submitted 5 June, 2022; v1 submitted 4 September, 2021; originally announced September 2021.

arXiv:2108.06847 [pdf, other]

Interpreting and improving deep-learning models with reality checks

Authors: Chandan Singh, Wooseok Ha, Bin Yu

Abstract: Recent deep-learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. Importantly, the proposed attributions assign importance to interactions between features, in a… ▽ More Recent deep-learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This chapter covers recent work aiming to interpret models by attributing importance to features and feature groups for a single prediction. Importantly, the proposed attributions assign importance to interactions between features, in addition to features in isolation. These attributions are shown to yield insights across real-world domains, including bio-imaging, cosmology image and natural-language processing. We then show how these attributions can be used to directly improve the generalization of a neural network or to distill it into a simple model. Throughout the chapter, we emphasize the use of reality checks to scrutinize the proposed interpretation techniques. △ Less

Submitted 18 August, 2021; v1 submitted 15 August, 2021; originally announced August 2021.

arXiv:2107.14138 [pdf, other]

Fast Beam Training for RIS-Assisted Uplink Communication

Authors: Chandradeep Singh, Kamal Singh, K. H. Liu

Abstract: In this work, we propose a beam training codebook for Reconfigurable Intelligent Surface (RIS) assisted mmWave uplink communication. Beam training procedure is important to establish a reliable link between user node and Access point (AP). A codebook based training procedure reduces the search time to obtain best possible phase shift by RIS controller to align incident beam at RIS in the direction… ▽ More In this work, we propose a beam training codebook for Reconfigurable Intelligent Surface (RIS) assisted mmWave uplink communication. Beam training procedure is important to establish a reliable link between user node and Access point (AP). A codebook based training procedure reduces the search time to obtain best possible phase shift by RIS controller to align incident beam at RIS in the direction of receiving node. We consider a semi passive RIS to assist RIS controller with a feedback of minimum overhead. It is shown that the procedure detects a mobile node with high probability in a short interval of time. Further we use the same codebook at user node to know the desired direction of communication via RIS. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: This is codebook for single user case in RIS-assisted uplink communication. We have introduced fine correction in last stage of beam training

arXiv:2107.10937 [pdf, other]

Reconfigurable Intelligent Surfaces Aided Communication: Capacity and Performance Analysis Over Rician Fading Channel

Authors: Chandradeep Singh, Chia Hsiang Lin

Abstract: In this work, we consider a single input single output (SISO) system for Reconfigurable Intelligent Surface (RIS) assisted mmWave communication. We consider Rician channel models over user node to RIS and RIS to Access Point (AP). We obtain closed form expressions for capacity with channel state information (CSI) and without CSI at the transmitter. Newly derived capacity expressions are closed for… ▽ More In this work, we consider a single input single output (SISO) system for Reconfigurable Intelligent Surface (RIS) assisted mmWave communication. We consider Rician channel models over user node to RIS and RIS to Access Point (AP). We obtain closed form expressions for capacity with channel state information (CSI) and without CSI at the transmitter. Newly derived capacity expressions are closed form expressions in a very compact form. We also simplified the closed form expressions for average symbol error probability. We also characterize the impacts of key parameters Rician factor K and number of elements on IRS on ergodic capacity with CSI and without CSI at the transmitter. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: This work correct the errors in equations (4), (5) of reference [17]. Our ASEP and Capacity expressions are more compact and simplified than in reference [17]. To the best of our knowledge these expressions in eq. (10),(15) and (17) are not available in the literature. Literature does not consider capacity analysis with CSI at transmitter for RIS aided communication equation (17)

arXiv:2107.09145 [pdf, other]

Adaptive wavelet distillation from neural networks through interpretations

Authors: Wooseok Ha, Chandan Singh, Francois Lanusse, Srigokul Upadhyayula, Bin Yu

Abstract: Recent deep-learning models have achieved impressive prediction performance, but often sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where models must be carefully vetted or where interpretation is the goal itself. Moreover, interpretable models are concise and often yield computational efficiency. Here, we p… ▽ More Recent deep-learning models have achieved impressive prediction performance, but often sacrifice interpretability and computational efficiency. Interpretability is crucial in many disciplines, such as science and medicine, where models must be carefully vetted or where interpretation is the goal itself. Moreover, interpretable models are concise and often yield computational efficiency. Here, we propose adaptive wavelet distillation (AWD), a method which aims to distill information from a trained neural network into a wavelet transform. Specifically, AWD penalizes feature attributions of a neural network in the wavelet domain to learn an effective multi-resolution wavelet transform. The resulting model is highly predictive, concise, computationally efficient, and has properties (such as a multi-scale structure) which make it easy to interpret. In close collaboration with domain experts, we showcase how AWD addresses challenges in two real-world settings: cosmological parameter inference and molecular-partner prediction. In both cases, AWD yields a scientifically interpretable and concise model which gives predictive performance better than state-of-the-art neural networks. Moreover, AWD identifies predictive features that are scientifically meaningful in the context of respective domains. All code and models are released in a full-fledged package available on Github (https://github.com/Yu-Group/adaptive-wavelets). △ Less

Submitted 26 August, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

arXiv:2107.03749 [pdf, other]

doi 10.1371/journal.pone.0270131

Quantifying the rise and fall of scientific fields

Authors: Chakresh Singh, Emma Barme, Robert Ward, Liubov Tupikina, Marc Santolini

Abstract: Science advances by pushing the boundaries of the adjacent possible. While the global scientific enterprise grows at an exponential pace, at the mesoscopic level the exploration and exploitation of research ideas is reflected through the rise and fall of research fields. The empirical literature has largely studied such dynamics on a case-by-case basis, with a focus on explaining how and why commu… ▽ More Science advances by pushing the boundaries of the adjacent possible. While the global scientific enterprise grows at an exponential pace, at the mesoscopic level the exploration and exploitation of research ideas is reflected through the rise and fall of research fields. The empirical literature has largely studied such dynamics on a case-by-case basis, with a focus on explaining how and why communities of knowledge production evolve. Although fields rise and fall on different temporal and population scales, they are generally argued to pass through a common set of evolutionary stages. To understand the social processes that drive these stages beyond case studies, we need a way to quantify and compare different fields on the same terms. In this paper we develop techniques for identifying scale-invariant patterns in the evolution of scientific fields, and demonstrate their usefulness using 1.5 million preprints from the arXiv repository covering 175 research fields spanning Physics, Mathematics, Computer Science, Quantitative Biology and Quantitative Finance. We show that fields consistently follows a rise and fall pattern captured by a two parameters right-tailed Gumbel temporal distribution. We introduce a field-specific rescaled time and explore the generic properties shared by articles and authors at the creation, adoption, peak, and decay evolutionary phases. We find that the early phase of a field is characterized by the mixing of cognitively distant fields by small teams of interdisciplinary authors, while late phases exhibit the role of specialized, large teams building on the previous works in the field. This method provides foundations to quantitatively explore the generic patterns underlying the evolution of research fields in science, with general implications in innovation studies. △ Less

Submitted 9 July, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 18 pages, 4 figures, 8 SI figures

arXiv:2106.15045 [pdf, other]

EVPropNet: Detecting Drones By Finding Propellers For Mid-Air Landing And Following

Authors: Nitin J. Sanket, Chahat Deep Singh, Chethan M. Parameshwara, Cornelia Fermüller, Guido C. H. E. de Croon, Yiannis Aloimonos

Abstract: The rapid rise of accessibility of unmanned aerial vehicles or drones pose a threat to general security and confidentiality. Most of the commercially available or custom-built drones are multi-rotors and are comprised of multiple propellers. Since these propellers rotate at a high-speed, they are generally the fastest moving parts of an image and cannot be directly "seen" by a classical camera wit… ▽ More The rapid rise of accessibility of unmanned aerial vehicles or drones pose a threat to general security and confidentiality. Most of the commercially available or custom-built drones are multi-rotors and are comprised of multiple propellers. Since these propellers rotate at a high-speed, they are generally the fastest moving parts of an image and cannot be directly "seen" by a classical camera without severe motion blur. We utilize a class of sensors that are particularly suitable for such scenarios called event cameras, which have a high temporal resolution, low-latency, and high dynamic range. In this paper, we model the geometry of a propeller and use it to generate simulated events which are used to train a deep neural network called EVPropNet to detect propellers from the data of an event camera. EVPropNet directly transfers to the real world without any fine-tuning or retraining. We present two applications of our network: (a) tracking and following an unmarked drone and (b) landing on a near-hover drone. We successfully evaluate and demonstrate the proposed approach in many real-world experiments with different propeller shapes and sizes. Our network can detect propellers at a rate of 85.1% even when 60% of the propeller is occluded and can run at upto 35Hz on a 2W power budget. To our knowledge, this is the first deep learning-based solution for detecting propellers (to detect drones). Finally, our applications also show an impressive success rate of 92% and 90% for the tracking and landing tasks respectively. △ Less

Submitted 28 June, 2021; originally announced June 2021.

Comments: 11 pages, 10 figures, 6 tables. Accepted in Robotics: Science and Systems (RSS) 2021

arXiv:2105.02002 [pdf, ps, other]

doi 10.1016/j.peva.2021.102282

Optimal Pricing in Multi Server Systems

Authors: Ashok Krishnan K. S, Chandramani Singh, Siva Theja Maguluri, Parimal Parag

Abstract: We study optimal service pricing in server farms where customers arrive according to a renewal process and have independent and identical ($i.i.d.$) exponential service times and $i.i.d.$ valuations of the service. The service provider charges a time varying service fee aiming at maximizing its revenue rate. The customers that find free servers and service fees lesser than their valuation join for… ▽ More We study optimal service pricing in server farms where customers arrive according to a renewal process and have independent and identical ($i.i.d.$) exponential service times and $i.i.d.$ valuations of the service. The service provider charges a time varying service fee aiming at maximizing its revenue rate. The customers that find free servers and service fees lesser than their valuation join for the service else they leave without waiting. We consider both finite server and infinite server farms. We solve the optimal pricing problems using the framework of Markov decision problems. We show that the optimal prices depend on the number of free servers. We propose algorithms to compute the optimal prices. We also establish several properties of the optimal prices and the corresponding revenue rates in the case of Poisson customer arrivals. We illustrate all our findings via numerical results. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Report number: Volume 154, April 2022, pp. 102282

Journal ref: Performance Evaluation 2022

arXiv:2103.13455 [pdf, other]

Matched sample selection with GANs for mitigating attribute confounding

Authors: Chandan Singh, Guha Balakrishnan, Pietro Perona

Abstract: Measuring biases of vision systems with respect to protected attributes like gender and age is critical as these systems gain widespread use in society. However, significant correlations between attributes in benchmark datasets make it difficult to separate algorithmic bias from dataset bias. To mitigate such attribute confounding during bias analysis, we propose a matching approach that selects a… ▽ More Measuring biases of vision systems with respect to protected attributes like gender and age is critical as these systems gain widespread use in society. However, significant correlations between attributes in benchmark datasets make it difficult to separate algorithmic bias from dataset bias. To mitigate such attribute confounding during bias analysis, we propose a matching approach that selects a subset of images from the full dataset with balanced attribute distributions across protected attributes. Our matching approach first projects real images onto a generative adversarial network (GAN)'s latent space in a manner that preserves semantic attributes. It then finds image matches in this latent space across a chosen protected attribute, yielding a dataset where semantic and perceptual attributes are balanced across the protected attribute. We validate projection and matching strategies with qualitative, quantitative, and human annotation experiments. We demonstrate our work in the context of gender bias in multiple open-source facial-recognition classifiers and find that bias persists after removing key confounders via matching. Code and documentation to reproduce the results here and apply the methods to new data is available at https://github.com/csinva/matching-with-gans . △ Less

Submitted 24 March, 2021; originally announced March 2021.

Showing 1–50 of 91 results for author: Singh, C