-
Improved linearly ordered colorings of hypergraphs via SDP rounding
Authors:
Anand Louis,
Alantha Newman,
Arka Ray
Abstract:
We consider the problem of linearly ordered (LO) coloring of hypergraphs. A hypergraph has an LO coloring if there is a vertex coloring, using a set of ordered colors, so that (i) no edge is monochromatic, and (ii) each edge has a unique maximum color. It is an open question as to whether or not a 2-LO colorable 3-uniform hypergraph can be LO colored with 3 colors in polynomial time. Nakajima and…
▽ More
We consider the problem of linearly ordered (LO) coloring of hypergraphs. A hypergraph has an LO coloring if there is a vertex coloring, using a set of ordered colors, so that (i) no edge is monochromatic, and (ii) each edge has a unique maximum color. It is an open question as to whether or not a 2-LO colorable 3-uniform hypergraph can be LO colored with 3 colors in polynomial time. Nakajima and Zivný recently gave a polynomial-time algorithm to color such hypergraphs with $\widetilde{O}(n^{1/3})$ colors and asked if SDP methods can be used directly to obtain improved bounds. Our main result is to show how to use SDP-based rounding methods to produce an LO coloring with $\widetilde{O}(n^{1/5})$ colors for such hypergraphs. We first show that we can reduce the problem to cases with highly structured SDP solutions, which we call balanced hypergraphs. Then we show how to apply classic SDP-rounding tools in this case. We believe that the reduction to balanced hypergraphs is novel and could be of independent interest.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
CFAT: Unleashing TriangularWindows for Image Super-resolution
Authors:
Abhisek Ray,
Gaurav Kumar,
Maheshkumar H. Kolekar
Abstract:
Transformer-based models have revolutionized the field of image super-resolution (SR) by harnessing their inherent ability to capture complex contextual features. The overlapping rectangular shifted window technique used in transformer architecture nowadays is a common practice in super-resolution models to improve the quality and robustness of image upscaling. However, it suffers from distortion…
▽ More
Transformer-based models have revolutionized the field of image super-resolution (SR) by harnessing their inherent ability to capture complex contextual features. The overlapping rectangular shifted window technique used in transformer architecture nowadays is a common practice in super-resolution models to improve the quality and robustness of image upscaling. However, it suffers from distortion at the boundaries and has limited unique shifting modes. To overcome these weaknesses, we propose a non-overlapping triangular window technique that synchronously works with the rectangular one to mitigate boundary-level distortion and allows the model to access more unique sifting modes. In this paper, we propose a Composite Fusion Attention Transformer (CFAT) that incorporates triangular-rectangular window-based local attention with a channel-based global attention technique in image super-resolution. As a result, CFAT enables attention mechanisms to be activated on more image pixels and captures long-range, multi-scale features to improve SR performance. The extensive experimental results and ablation study demonstrate the effectiveness of CFAT in the SR domain. Our proposed model shows a significant 0.7 dB performance improvement over other state-of-the-art SR architectures.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Task and Motion Planning in Hierarchical 3D Scene Graphs
Authors:
Aaron Ray,
Christopher Bradley,
Luca Carlone,
Nicholas Roy
Abstract:
Recent work in the construction of 3D scene graphs has enabled mobile robots to build large-scale hybrid metric-semantic hierarchical representations of the world. These detailed models contain information that is useful for planning, however how to derive a planning domain from a 3D scene graph that enables efficient computation of executable plans is an open question. In this work, we present a…
▽ More
Recent work in the construction of 3D scene graphs has enabled mobile robots to build large-scale hybrid metric-semantic hierarchical representations of the world. These detailed models contain information that is useful for planning, however how to derive a planning domain from a 3D scene graph that enables efficient computation of executable plans is an open question. In this work, we present a novel approach for defining and solving Task and Motion Planning problems in large-scale environments using hierarchical 3D scene graphs. We identify a method for building sparse problem domains which enable scaling to large scenes, and propose a technique for incrementally adding objects to that domain during planning time to avoid wasting computation on irrelevant elements of the scene graph. We test our approach in two hand crafted domains as well as two scene graphs built from perception, including one constructed from the KITTI dataset. A video supplement is available at https://youtu.be/63xuCCaN0I4.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
An LLM Maturity Model for Reliable and Transparent Text-to-Query
Authors:
Lei Yu,
Abir Ray
Abstract:
Recognizing the imperative to address the reliability and transparency issues of Large Language Models (LLM), this work proposes an LLM maturity model tailored for text-to-query applications. This maturity model seeks to fill the existing void in evaluating LLMs in such applications by incorporating dimensions beyond mere correctness or accuracy. Moreover, this work introduces a real-world use cas…
▽ More
Recognizing the imperative to address the reliability and transparency issues of Large Language Models (LLM), this work proposes an LLM maturity model tailored for text-to-query applications. This maturity model seeks to fill the existing void in evaluating LLMs in such applications by incorporating dimensions beyond mere correctness or accuracy. Moreover, this work introduces a real-world use case from the law enforcement domain and showcases QueryIQ, an LLM-powered, domain-specific text-to-query assistant to expedite user workflows and reveal hidden relationship in data.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Continuous-Variable QKD with key rates far above Devetak-Winter
Authors:
Arpan Akash Ray,
Boris Skoric
Abstract:
Continuous-Variable Quantum Key Distribution (CVQKD) at large distances has such high noise levels that the employed error-correcting codes must have very low rate. In this regime it becomes feasible to implement random-codebook error correction, which is known to perform close to capacity. We propose a random-codebook reverse reconciliation scheme for CVQKD that is inspired by spread-spectrum wat…
▽ More
Continuous-Variable Quantum Key Distribution (CVQKD) at large distances has such high noise levels that the employed error-correcting codes must have very low rate. In this regime it becomes feasible to implement random-codebook error correction, which is known to perform close to capacity. We propose a random-codebook reverse reconciliation scheme for CVQKD that is inspired by spread-spectrum watermarking. Our scheme has a novel way of achieving statistical decoupling between the publicly sent reconciliation data and the secret key. We provide a theoretical analysis of the secret key rate and we present numerical results. The best performance is obtained when the message size exceeds the mutual information I(X;Y) between Alice and Bob's measurements. This somewhat counter-intuitive result is understood from a tradeoff between code rate and frame rejection rate, combined with the fact that error correction for QKD needs to reconcile only random data. We obtain secret key lengths that lie far above the Devetak-Winter value I(X;Y)-I(E;Y).
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Compositional Generalization in Spoken Language Understanding
Authors:
Avik Ray,
Yilin Shen,
Hongxia Jin
Abstract:
State-of-the-art spoken language understanding (SLU) models have shown tremendous success in benchmark SLU datasets, yet they still fail in many practical scenario due to the lack of model compositionality when trained on limited training data. In this paper, we study two types of compositionality: (a) novel slot combination, and (b) length generalization. We first conduct in-depth analysis, and f…
▽ More
State-of-the-art spoken language understanding (SLU) models have shown tremendous success in benchmark SLU datasets, yet they still fail in many practical scenario due to the lack of model compositionality when trained on limited training data. In this paper, we study two types of compositionality: (a) novel slot combination, and (b) length generalization. We first conduct in-depth analysis, and find that state-of-the-art SLU models often learn spurious slot correlations during training, which leads to poor performance in both compositional cases. To mitigate these limitations, we create the first compositional splits of benchmark SLU datasets and we propose the first compositional SLU model, including compositional loss and paired training that tackle each compositional case respectively. On both benchmark and compositional splits in ATIS and SNIPS, we show that our compositional SLU model significantly outperforms (up to $5\%$ F1 score) state-of-the-art BERT SLU model.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Towards End-to-End Structure Solutions from Information-Compromised Diffraction Data via Generative Deep Learning
Authors:
Gabe Guo,
Judah Goldfeder,
Ling Lan,
Aniv Ray,
Albert Hanming Yang,
Boyuan Chen,
Simon JL Billinge,
Hod Lipson
Abstract:
The revolution in materials in the past century was built on a knowledge of the atomic arrangements and the structure-property relationship. The sine qua non for obtaining quantitative structural information is single crystal crystallography. However, increasingly we need to solve structures in cases where the information content in our input signal is significantly degraded, for example, due to o…
▽ More
The revolution in materials in the past century was built on a knowledge of the atomic arrangements and the structure-property relationship. The sine qua non for obtaining quantitative structural information is single crystal crystallography. However, increasingly we need to solve structures in cases where the information content in our input signal is significantly degraded, for example, due to orientational averaging of grains, finite size effects due to nanostructure, and mixed signals due to sample heterogeneity. Understanding the structure property relationships in such situations is, if anything, more important and insightful, yet we do not have robust approaches for accomplishing it. In principle, machine learning (ML) and deep learning (DL) are promising approaches since they augment information in the degraded input signal with prior knowledge learned from large databases of already known structures. Here we present a novel ML approach, a variational query-based multi-branch deep neural network that has the promise to be a robust but general tool to address this problem end-to-end. We demonstrate the approach on computed powder x-ray diffraction (PXRD), along with partial chemical composition information, as input. We choose as a structural representation a modified electron density we call the Cartesian mapped electron density (CMED), that straightforwardly allows our ML model to learn material structures across different chemistries, symmetries and crystal systems. When evaluated on theoretically simulated data for the cubic and trigonal crystal systems, the system achieves up to $93.4\%$ average similarity with the ground truth on unseen materials, both with known and partially-known chemical composition information, showing great promise for successful structure solution even from degraded and incomplete input data.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Hardcaml: An OCaml Hardware Domain-Specific Language for Efficient and Robust Design
Authors:
Andy Ray,
Benjamin Devlin,
Fu Yong Quah,
Rahul Yesantharao
Abstract:
This paper introduces Hardcaml, an embedded hardware design domain specific language (DSL) implemented in the OCaml programming language. Unlike high level synthesis (HLS), Hardcaml allows for low level control of the underlying hardware for maximum productivity, while abstracting away many of the tedious aspects of traditional hardware definition languages (HDLs) such as Verilog or VHDL. The rich…
▽ More
This paper introduces Hardcaml, an embedded hardware design domain specific language (DSL) implemented in the OCaml programming language. Unlike high level synthesis (HLS), Hardcaml allows for low level control of the underlying hardware for maximum productivity, while abstracting away many of the tedious aspects of traditional hardware definition languages (HDLs) such as Verilog or VHDL. The richness of OCaml's type system combined with Hardcaml's fast circuit elaboration checks reduces the chance of user-introduced bugs and erroneous connections with features like custom type defining, type-safe parameterized modules and elaboration-time bit-width inference and validation. Hardcaml tooling emphasizes fast feedback through simulation, testing, and verification. It includes both a native OCaml cycle-accurate and an event-driven simulator. Unit tests can live in the source code and include digital ASCII waveforms representing the simulator's output. Hardcaml also provides tools for SAT proving and formal verification. Hardcaml is industrially proven, and has been used at Jane Street internally for many large FPGA designs. As a case study we highlight several aspects of our recent Hardcaml submission to the 2022 ZPrize cryptography competition which won 1st place in the FPGA track.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
BloomVQA: Assessing Hierarchical Multi-modal Comprehension
Authors:
Yunye Gong,
Robik Shrestha,
Jared Claypoole,
Michael Cogswell,
Arijit Ray,
Christopher Kanan,
Ajay Divakaran
Abstract:
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoretical grounding, we collect multiple-choice samples based on picture stories that reflect different levels of comprehension, as laid out in Bloom's Taxo…
▽ More
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoretical grounding, we collect multiple-choice samples based on picture stories that reflect different levels of comprehension, as laid out in Bloom's Taxonomy, a classic framework for learning assessment widely adopted in education research. Our data maps to a novel hierarchical graph representation which enables automatic data augmentation and novel measures characterizing model consistency. We perform graded evaluation and reliability analysis on recent multi-modal models. In comparison to low-level tasks, we observe decreased performance on tasks requiring advanced comprehension and cognitive skills with up to 38.0\% drop in VQA accuracy. In comparison to earlier models, GPT-4V demonstrates improved accuracy over all comprehension levels and shows a tendency of bypassing visual inputs especially for higher-level tasks. Current models also show consistency patterns misaligned with human comprehension in various scenarios, demonstrating the need for improvement based on theoretically-grounded criteria.
△ Less
Submitted 10 June, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Lasagna: Layered Score Distillation for Disentangled Object Relighting
Authors:
Dina Bashkirova,
Arijit Ray,
Rupayan Mallick,
Sarah Adel Bargal,
Jianming Zhang,
Ranjay Krishna,
Kate Saenko
Abstract:
Professional artists, photographers, and other visual content creators use object relighting to establish their photo's desired effect. Unfortunately, manual tools that allow relighting have a steep learning curve and are difficult to master. Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to kee…
▽ More
Professional artists, photographers, and other visual content creators use object relighting to establish their photo's desired effect. Unfortunately, manual tools that allow relighting have a steep learning curve and are difficult to master. Although generative editing methods now enable some forms of image editing, relighting is still beyond today's capabilities; existing methods struggle to keep other aspects of the image -- colors, shapes, and textures -- consistent after the edit. We propose Lasagna, a method that enables intuitive text-guided relighting control. Lasagna learns a lighting prior by using score distillation sampling to distill the prior of a diffusion model, which has been finetuned on synthetic relighting data. To train Lasagna, we curate a new synthetic dataset ReLiT, which contains 3D object assets re-lit from multiple light source locations. Despite training on synthetic images, quantitative results show that Lasagna relights real-world images while preserving other aspects of the input image, outperforming state-of-the-art text-guided image editing methods. Lasagna enables realistic and controlled results on natural images and digital art pieces and is preferred by humans over other methods in over 91% of cases. Finally, we demonstrate the versatility of our learning objective by extending it to allow colorization, another form of image editing.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
IR-UWB Radar-based Situational Awareness System for Smartphone-Distracted Pedestrians
Authors:
Jamsheed Manja Ppallan,
Ruchi Pandey,
Yellappa Damam,
Vijay Narayan Tiwari,
Karthikeyan Arunachalam,
Antariksha Ray
Abstract:
With the widespread adoption of smartphones, ensuring pedestrian safety on roads has become a critical concern due to smartphone distraction. This paper proposes a novel and real-time assistance system called UWB-assisted Safe Walk (UASW) for obstacle detection and warns users about real-time situations. The proposed method leverages Impulse Radio Ultra-Wideband (IR-UWB) radar embedded in the smar…
▽ More
With the widespread adoption of smartphones, ensuring pedestrian safety on roads has become a critical concern due to smartphone distraction. This paper proposes a novel and real-time assistance system called UWB-assisted Safe Walk (UASW) for obstacle detection and warns users about real-time situations. The proposed method leverages Impulse Radio Ultra-Wideband (IR-UWB) radar embedded in the smartphone, which provides excellent range resolution and high noise resilience using short pulses. We implemented UASW specifically for Android smartphones with IR-UWB connectivity. The framework uses complex Channel Impulse Response (CIR) data to integrate rule-based obstacle detection with artificial neural network (ANN) based obstacle classification. The performance of the proposed UASW system is analyzed using real-time collected data. The results show that the proposed system achieves an obstacle detection accuracy of up to 97% and obstacle classification accuracy of up to 95% with an inference delay of 26.8 ms. The results highlight the effectiveness of UASW in assisting smartphone-distracted pedestrians and improving their situational awareness.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Contextual Data Augmentation for Task-Oriented Dialog Systems
Authors:
Dustin Axman,
Avik Ray,
Shubham Garg,
Jing Huang
Abstract:
Collection of annotated dialogs for training task-oriented dialog systems have been one of the key bottlenecks in improving current models. While dialog response generation has been widely studied on the agent side, it is not evident if similar generative models can be used to generate a large variety of, and often unexpected, user inputs that real dialog systems encounter in practice. Existing da…
▽ More
Collection of annotated dialogs for training task-oriented dialog systems have been one of the key bottlenecks in improving current models. While dialog response generation has been widely studied on the agent side, it is not evident if similar generative models can be used to generate a large variety of, and often unexpected, user inputs that real dialog systems encounter in practice. Existing data augmentation techniques such as paraphrase generation do not take the dialog context into consideration. In this paper, we develop a novel dialog augmentation model that generates a user turn, conditioning on full dialog context. Additionally, with a new prompt design for language model, and output re-ranking, the dialogs generated from our model can be directly used to train downstream dialog systems. On common benchmark datasets MultiWoZ and SGD, we show that our dialog augmentation model generates high quality dialogs and improves dialog success rate by as much as $8\%$ over baseline.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Socratis: Are large multimodal models emotionally aware?
Authors:
Katherine Deng,
Arijit Ray,
Reuben Tan,
Saadia Gabriel,
Bryan A. Plummer,
Kate Saenko
Abstract:
Existing emotion prediction benchmarks contain coarse emotion labels which do not consider the diversity of emotions that an image and text can elicit in humans due to various reasons. Learning diverse reactions to multimodal content is important as intelligent machines take a central role in generating and delivering content to society. To address this gap, we propose Socratis, a societal reactio…
▽ More
Existing emotion prediction benchmarks contain coarse emotion labels which do not consider the diversity of emotions that an image and text can elicit in humans due to various reasons. Learning diverse reactions to multimodal content is important as intelligent machines take a central role in generating and delivering content to society. To address this gap, we propose Socratis, a societal reactions benchmark, where each image-caption (IC) pair is annotated with multiple emotions and the reasons for feeling them. Socratis contains 18K free-form reactions for 980 emotions on 2075 image-caption pairs from 5 widely-read news and image-caption (IC) datasets. We benchmark the capability of state-of-the-art multimodal large language models to generate the reasons for feeling an emotion given an IC pair. Based on a preliminary human study, we observe that humans prefer human-written reasons over 2 times more often than machine-generated ones. This shows our task is harder than standard generation tasks because it starkly contrasts recent findings where humans cannot tell apart machine vs human-written news articles, for instance. We further see that current captioning metrics based on large vision-language models also fail to correlate with human preferences. We hope that these findings and our benchmark will inspire further research on training emotionally aware models.
△ Less
Submitted 2 November, 2023; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Unlocking Hardware Security Assurance: The Potential of LLMs
Authors:
Xingyu Meng,
Amisha Srivastava,
Ayush Arunachalam,
Avik Ray,
Pedro Henrique Silva,
Rafail Psiakis,
Yiorgos Makris,
Kanad Basu
Abstract:
System-on-Chips (SoCs) form the crux of modern computing systems. SoCs enable high-level integration through the utilization of multiple Intellectual Property (IP) cores. However, the integration of multiple IP cores also presents unique challenges owing to their inherent vulnerabilities, thereby compromising the security of the entire system. Hence, it is imperative to perform hardware security v…
▽ More
System-on-Chips (SoCs) form the crux of modern computing systems. SoCs enable high-level integration through the utilization of multiple Intellectual Property (IP) cores. However, the integration of multiple IP cores also presents unique challenges owing to their inherent vulnerabilities, thereby compromising the security of the entire system. Hence, it is imperative to perform hardware security validation to address these concerns. The efficiency of this validation procedure is contingent on the quality of the SoC security properties provided. However, generating security properties with traditional approaches often requires expert intervention and is limited to a few IPs, thereby resulting in a time-consuming and non-robust process. To address this issue, we, for the first time, propose a novel and automated Natural Language Processing (NLP)-based Security Property Generator (NSPG). Specifically, our approach utilizes hardware documentation in order to propose the first hardware security-specific language model, HS-BERT, for extracting security properties dedicated to hardware design. To evaluate our proposed technique, we trained the HS-BERT model using sentences from RISC-V, OpenRISC, MIPS, OpenSPARC, and OpenTitan SoC documentation. When assessedb on five untrained OpenTitan hardware IP documents, NSPG was able to extract 326 security properties from 1723 sentences. This, in turn, aided in identifying eight security bugs in the OpenTitan SoC design presented in the hardware hacking competition, Hack@DAC 2022.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Aggressive Aerial Grasping using a Soft Drone with Onboard Perception
Authors:
Samuel Ubellacker,
Aaron Ray,
James Bern,
Jared Strader,
Luca Carlone
Abstract:
Contrary to the stunning feats observed in birds of prey, aerial manipulation and grasping with flying robots still lack versatility and agility. Conventional approaches using rigid manipulators require precise positioning and are subject to large reaction forces at grasp, which limit performance at high speeds. The few reported examples of aggressive aerial grasping rely on motion capture systems…
▽ More
Contrary to the stunning feats observed in birds of prey, aerial manipulation and grasping with flying robots still lack versatility and agility. Conventional approaches using rigid manipulators require precise positioning and are subject to large reaction forces at grasp, which limit performance at high speeds. The few reported examples of aggressive aerial grasping rely on motion capture systems, or fail to generalize across environments and grasp targets. We describe the first example of a soft aerial manipulator equipped with a fully onboard perception pipeline, capable of robustly localizing and grasping visually and morphologically varied objects. The proposed system features a novel passively closing tendon-actuated soft gripper that enables fast closure at grasp, while compensating for position errors, complying to the target-object morphology, and dampening reaction forces. The system includes an onboard perception pipeline that combines a neural-network-based semantic keypoint detector with a state-of-the-art robust 3D object pose estimator, whose estimate is further refined using a fixed-lag smoother. The resulting pose estimate is passed to a minimum-snap trajectory planner, tracked by an adaptive controller that fully compensates for the added mass of the grasped object. Finally, a finite-element-based controller determines optimal gripper configurations for grasping. Rigorous experiments confirm that our approach enables dynamic, aggressive, and versatile grasping. We demonstrate fully onboard vision-based grasps of a variety of objects, in both indoor and outdoor environments, and up to speeds of 2.0 m/s -- the fastest vision-based grasp reported in the literature. Finally, we take a major step in expanding the utility of our platform beyond stationary targets, by demonstrating motion-capture-based grasps of targets moving up to 0.3 m/s, with relative speeds up to 1.5 m/s.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Code-Switched Text Synthesis in Unseen Language Pairs
Authors:
I-Hung Hsu,
Avik Ray,
Shubham Garg,
Nanyun Peng,
Jing Huang
Abstract:
Existing efforts on text synthesis for code-switching mostly require training on code-switched texts in the target language pairs, limiting the deployment of the models to cases lacking code-switched data. In this work, we study the problem of synthesizing code-switched texts for language pairs absent from the training data. We introduce GLOSS, a model built on top of a pre-trained multilingual ma…
▽ More
Existing efforts on text synthesis for code-switching mostly require training on code-switched texts in the target language pairs, limiting the deployment of the models to cases lacking code-switched data. In this work, we study the problem of synthesizing code-switched texts for language pairs absent from the training data. We introduce GLOSS, a model built on top of a pre-trained multilingual machine translation model (PMMTM) with an additional code-switching module. This module, either an adapter or extra prefixes, learns code-switching patterns from code-switched data during training, while the primary component of GLOSS, i.e., the PMMTM, is frozen. The design of only adjusting the code-switching module prevents our model from overfitting to the constrained training data for code-switching. Hence, GLOSS exhibits the ability to generalize and synthesize code-switched texts across a broader spectrum of language pairs. Additionally, we develop a self-training algorithm on target language pairs further to enhance the reliability of GLOSS. Automatic evaluations on four language pairs show that GLOSS achieves at least 55% relative BLEU and METEOR scores improvements compared to strong baselines. Human evaluations on two language pairs further validate the success of GLOSS.
△ Less
Submitted 7 July, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
DeepCollide: Scalable Data-Driven High DoF Configuration Space Modeling using Implicit Neural Representations
Authors:
Gabriel Guo,
Judah Goldfeder,
Aniv Ray,
Tony Dear,
Hod Lipson
Abstract:
Collision detection is essential to virtually all robotics applications. However, traditional geometric collision detection methods generally require pre-existing workspace geometry representations; thus, they are unable to infer the collision detection function from sampled data when geometric information is unavailable. Learning-based approaches can overcome this limitation. Following this line…
▽ More
Collision detection is essential to virtually all robotics applications. However, traditional geometric collision detection methods generally require pre-existing workspace geometry representations; thus, they are unable to infer the collision detection function from sampled data when geometric information is unavailable. Learning-based approaches can overcome this limitation. Following this line of research, we present DeepCollide, an implicit neural representation method for approximating the collision detection function from sampled collision data. As shown by our theoretical analysis and empirical evidence, DeepCollide presents clear benefits over the state-of-the-art, as it relates to time cost scalability with respect to training data and DoF, as well as the ability to accurately express complex workspace geometries. We publicly release our code.
△ Less
Submitted 13 September, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Universal Matrix Sparsifiers and Fast Deterministic Algorithms for Linear Algebra
Authors:
Rajarshi Bhattacharjee,
Gregory Dexter,
Cameron Musco,
Archan Ray,
Sushant Sachdeva,
David P Woodruff
Abstract:
Let $\mathbf S \in \mathbb R^{n \times n}$ satisfy $\|\mathbf 1-\mathbf S\|_2\leεn$, where $\mathbf 1$ is the all ones matrix and $\|\cdot\|_2$ is the spectral norm. It is well-known that there exists such an $\mathbf S$ with just $O(n/ε^2)$ non-zero entries: we can let $\mathbf S$ be the scaled adjacency matrix of a Ramanujan expander graph. We show that such an $\mathbf S$ yields a $universal$…
▽ More
Let $\mathbf S \in \mathbb R^{n \times n}$ satisfy $\|\mathbf 1-\mathbf S\|_2\leεn$, where $\mathbf 1$ is the all ones matrix and $\|\cdot\|_2$ is the spectral norm. It is well-known that there exists such an $\mathbf S$ with just $O(n/ε^2)$ non-zero entries: we can let $\mathbf S$ be the scaled adjacency matrix of a Ramanujan expander graph. We show that such an $\mathbf S$ yields a $universal$ $sparsifier$ for any positive semidefinite (PSD) matrix. In particular, for any PSD $\mathbf A \in \mathbb{R}^{n\times n}$ with entries bounded in magnitude by $1$, $\|\mathbf A - \mathbf A\circ\mathbf S\|_2 \le εn$, where $\circ$ denotes the entrywise (Hadamard) product. Our techniques also give universal sparsifiers for non-PSD matrices. In this case, letting $\mathbf S$ be the scaled adjacency matrix of a Ramanujan graph with $\tilde O(n/ε^4)$ edges, we have $\|\mathbf A - \mathbf A \circ \mathbf S \|_2 \le ε\cdot \max(n,\|\mathbf A\|_1)$, where $\|\mathbf A\|_1$ is the nuclear norm. We show that the above bounds for both PSD and non-PSD matrices are tight up to log factors.
Since $\mathbf A \circ \mathbf S$ can be constructed deterministically, our result for PSD matrices derandomizes and improves upon known results for randomized matrix sparsification, which require randomly sampling ${O}(\frac{n \log n}{ε^2})$ entries. We also leverage our results to give the first deterministic algorithms for several problems related to singular value approximation that run in faster than matrix multiplication time.
Finally, if $\mathbf A \in \{-1,0,1\}^{n \times n}$ is PSD, we show that $\mathbf{\tilde A}$ with $\|\mathbf A - \mathbf{\tilde A}\|_2 \le εn$ can be obtained by deterministically reading $\tilde O(n/ε)$ entries of $\mathbf A$. This improves the $1/ε$ dependence on our result for general PSD matrices and is near-optimal.
△ Less
Submitted 12 January, 2024; v1 submitted 9 May, 2023;
originally announced May 2023.
-
COLA: A Benchmark for Compositional Text-to-image Retrieval
Authors:
Arijit Ray,
Filip Radenovic,
Abhimanyu Dubey,
Bryan A. Plummer,
Ranjay Krishna,
Kate Saenko
Abstract:
Compositional reasoning is a hallmark of human visual intelligence. Yet, despite the size of large vision-language models, they struggle to represent simple compositions by combining objects with their attributes. To measure this lack of compositional capability, we design Cola, a text-to-image retrieval benchmark to Compose Objects Localized with Attributes. To solve Cola, a model must retrieve i…
▽ More
Compositional reasoning is a hallmark of human visual intelligence. Yet, despite the size of large vision-language models, they struggle to represent simple compositions by combining objects with their attributes. To measure this lack of compositional capability, we design Cola, a text-to-image retrieval benchmark to Compose Objects Localized with Attributes. To solve Cola, a model must retrieve images with the correct configuration of attributes and objects and avoid choosing a distractor image with the same objects and attributes but in the wrong configuration. Cola contains about 1.2k composed queries of 168 objects and 197 attributes on around 30K images. Our human evaluation finds that Cola is 83.33% accurate, similar to contemporary compositionality benchmarks. Using Cola as a testbed, we explore empirical modeling designs to adapt pre-trained vision-language models to reason compositionally. We explore 6 adaptation strategies on 2 seminal vision-language models, using compositionality-centric test benchmarks - Cola and CREPE. We find the optimal adaptation strategy is to train a multi-modal attention layer that jointly attends over the frozen pre-trained image and language features. Surprisingly, training multimodal layers on CLIP performs better than tuning a larger FLAVA model with already pre-trained multimodal layers. Furthermore, our adaptation strategy improves CLIP and FLAVA to comparable levels, suggesting that training multimodal layers using contrastive attribute-object data is key, as opposed to using them pre-trained. Lastly, we show that Cola is harder than a closely related contemporary benchmark, CREPE, since simpler fine-tuning strategies without multimodal layers suffice on CREPE but not on Cola. However, we still see a significant gap between our best adaptation and human accuracy, suggesting considerable room for further research.
△ Less
Submitted 2 November, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
Hydra-Multi: Collaborative Online Construction of 3D Scene Graphs with Multi-Robot Teams
Authors:
Yun Chang,
Nathan Hughes,
Aaron Ray,
Luca Carlone
Abstract:
3D scene graphs have recently emerged as an expressive high-level map representation that describes a 3D environment as a layered graph where nodes represent spatial concepts at multiple levels of abstraction (e.g., objects, rooms, buildings) and edges represent relations between concepts (e.g., inclusion, adjacency). This paper describes Hydra-Multi, the first multi-robot spatial perception syste…
▽ More
3D scene graphs have recently emerged as an expressive high-level map representation that describes a 3D environment as a layered graph where nodes represent spatial concepts at multiple levels of abstraction (e.g., objects, rooms, buildings) and edges represent relations between concepts (e.g., inclusion, adjacency). This paper describes Hydra-Multi, the first multi-robot spatial perception system capable of constructing a multi-robot 3D scene graph online from sensor data collected by robots in a team. In particular, we develop a centralized system capable of constructing a joint 3D scene graph by taking incremental inputs from multiple robots, effectively finding the relative transforms between the robots' frames, and incorporating loop closure detections to correctly reconcile the scene graph nodes from different robots. We evaluate Hydra-Multi on simulated and real scenarios and show it is able to reconstruct accurate 3D scene graphs online. We also demonstrate Hydra-Multi's capability of supporting heterogeneous teams by fusing different map representations built by robots with different sensor suites.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Systemic Fairness
Authors:
Arindam Ray,
Balaji Padmanabhan,
Lina Bouayad
Abstract:
Machine learning algorithms are increasingly used to make or support decisions in a wide range of settings. With such expansive use there is also growing concern about the fairness of such methods. Prior literature on algorithmic fairness has extensively addressed risks and in many cases presented approaches to manage some of them. However, most studies have focused on fairness issues that arise f…
▽ More
Machine learning algorithms are increasingly used to make or support decisions in a wide range of settings. With such expansive use there is also growing concern about the fairness of such methods. Prior literature on algorithmic fairness has extensively addressed risks and in many cases presented approaches to manage some of them. However, most studies have focused on fairness issues that arise from actions taken by a (single) focal decision-maker or agent. In contrast, most real-world systems have many agents that work collectively as part of a larger ecosystem. For example, in a lending scenario, there are multiple lenders who evaluate loans for applicants, along with policymakers and other institutions whose decisions also affect outcomes. Thus, the broader impact of any lending decision of a single decision maker will likely depend on the actions of multiple different agents in the ecosystem. This paper develops formalisms for firm versus systemic fairness, and calls for a greater focus in the algorithmic fairness literature on ecosystem-wide fairness - or more simply systemic fairness - in real-world contexts.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Language-Guided Audio-Visual Source Separation via Trimodal Consistency
Authors:
Reuben Tan,
Arijit Ray,
Andrea Burns,
Bryan A. Plummer,
Justin Salamon,
Oriol Nieto,
Bryan Russell,
Kate Saenko
Abstract:
We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data. A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform, all without access to…
▽ More
We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data. A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform, all without access to annotations during training. To overcome this challenge, we adapt off-the-shelf vision-language foundation models to provide pseudo-target supervision via two novel loss functions and encourage a stronger alignment between the audio, visual and natural language modalities. During inference, our approach can separate sounds given text, video and audio input, or given text and audio input alone. We demonstrate the effectiveness of our self-supervised approach on three audio-visual separation datasets, including MUSIC, SOLOS and AudioSet, where we outperform state-of-the-art strongly supervised approaches despite not using object detectors or text labels during training.
△ Less
Submitted 23 September, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Improved Hardness of Approximation for Geometric Bin Packing
Authors:
Arka Ray,
Sai Sandeep
Abstract:
The Geometric Bin Packing (GBP) problem is a generalization of Bin Packing where the input is a set of $d$-dimensional rectangles, and the goal is to pack them into unit $d$-dimensional cubes efficiently. It is NP-Hard to obtain a PTAS for the problem, even when $d=2$. For general $d$, the best-known approximation algorithm has an approximation guarantee exponential in $d$, while the best hardness…
▽ More
The Geometric Bin Packing (GBP) problem is a generalization of Bin Packing where the input is a set of $d$-dimensional rectangles, and the goal is to pack them into unit $d$-dimensional cubes efficiently. It is NP-Hard to obtain a PTAS for the problem, even when $d=2$. For general $d$, the best-known approximation algorithm has an approximation guarantee exponential in $d$, while the best hardness of approximation is still a small constant inapproximability from the case when $d=2$. In this paper, we show that the problem cannot be approximated within $d^{1-ε}$ factor unless NP=ZPP.
Recently, $d$-dimensional Vector Bin Packing, a closely related problem to the GBP, was shown to be hard to approximate within $Ω(\log d)$ when $d$ is a fixed constant, using a notion of Packing Dimension of set families. In this paper, we introduce a geometric analog of it, the Geometric Packing Dimension of set families. While we fall short of obtaining similar inapproximability results for the Geometric Bin Packing problem when $d$ is fixed, we prove a couple of key properties of the Geometric Packing Dimension which highlight fundamental differences between Geometric Bin Packing and Vector Bin Packing.
△ Less
Submitted 2 October, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Sparse Cuts in Hypergraphs from Random Walks on Simplicial Complexes
Authors:
Anand Louis,
Rameesh Paul,
Arka Ray
Abstract:
There are a lot of recent works on generalizing the spectral theory of graphs and graph partitioning to hypergraphs. There have been two broad directions toward this goal. One generalizes the notion of graph conductance to hypergraph conductance [LM16, CLTZ18]. In the second approach one can view a hypergraph as a simplicial complex and study its various topological properties [LM06, MW09, DKW16,…
▽ More
There are a lot of recent works on generalizing the spectral theory of graphs and graph partitioning to hypergraphs. There have been two broad directions toward this goal. One generalizes the notion of graph conductance to hypergraph conductance [LM16, CLTZ18]. In the second approach one can view a hypergraph as a simplicial complex and study its various topological properties [LM06, MW09, DKW16, PR17] and spectral properties [KM17, DK17, KO18a, KO18b, Opp20].
In this work, we attempt to bridge these two directions of study by relating the spectrum of {\em up-down walks} and {\em swap-walks} on the simplicial complex to hypergraph expansion. In surprising contrast to random-walks on graphs, we show that the spectral gap of swap-walks and up-down walks between level $m$ and $l$ with $1 < m \leq l$ can not be used to infer any bounds on hypergraph conductance. Moreover, we show that the spectral gap of swap-walks between $X(1)$ and $X(k-1)$ can not be used to infer any bounds on hypergraph conductance, whereas we give a Cheeger-like inequality relating the spectral of walks between level $1$ and $l$ for any $l \leq k$ to hypergraph expansion. This is a surprising difference between swaps-walks and up-down walks!
Finally, we also give a construction to show that the well-studied notion of {\em link expansion} in simplicial complexes can not be used to bound hypergraph expansion in a Cheeger-like manner.
△ Less
Submitted 3 October, 2023; v1 submitted 27 December, 2022;
originally announced December 2022.
-
Classical ensemble of Quantum-classical ML algorithms for Phishing detection in Ethereum transaction networks
Authors:
Anupama Ray,
Sai Sakunthala Guddanti,
Vishnu Ajith,
Dhinakaran Vinayagamurthy
Abstract:
Ethereum is one of the most valuable blockchain networks in terms of the total monetary value locked in it, and arguably been the most active network where new blockchain innovations in research and applications are demonstrated. But, this also leads to Ethereum network being susceptible to a wide variety of threats and attacks in an attempt to gain unreasonable advantage or to undermine the value…
▽ More
Ethereum is one of the most valuable blockchain networks in terms of the total monetary value locked in it, and arguably been the most active network where new blockchain innovations in research and applications are demonstrated. But, this also leads to Ethereum network being susceptible to a wide variety of threats and attacks in an attempt to gain unreasonable advantage or to undermine the value of the users. Even with the state-of-art classical ML algorithms, detecting such attacks is still hard. This motivated us to build a hybrid system of quantum-classical algorithms that improves phishing detection in financial transaction networks. This paper presents a classical ensemble pipeline of classical and quantum algorithms and a detailed study benchmarking existing Quantum Machine Learning algorithms such as Quantum Support Vector Machine and Variational Quantum Classifier. With the current generation of quantum hardware available, smaller datasets are more suited to the QML models and most research restricts to hundreds of samples. However, we experimented on different data sizes and report results with a test data of 12K transaction nodes, which is to the best of the authors knowledge the largest QML experiment run so far on any real quantum hardware. The classical ensembles of quantum-classical models improved the macro F-score and phishing F-score. One key observation is QSVM constantly gives lower false positives, thereby higher precision compared with any other classical or quantum network, which is always preferred for any anomaly detection problem. This is true for QSVMs when used individually or via bagging of same models or in combination with other classical/quantum models making it the most advantageous quantum algorithm so far. The proposed ensemble framework is generic and can be applied for any classification task
△ Less
Submitted 30 October, 2022;
originally announced November 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
A Multimodal Corpus for Emotion Recognition in Sarcasm
Authors:
Anupama Ray,
Shubham Mishra,
Apoorva Nunna,
Pushpak Bhattacharyya
Abstract:
While sentiment and emotion analysis have been studied extensively, the relationship between sarcasm and emotion has largely remained unexplored. A sarcastic expression may have a variety of underlying emotions. For example, "I love being ignored" belies sadness, while "my mobile is fabulous with a battery backup of only 15 minutes!" expresses frustration. Detecting the emotion behind a sarcastic…
▽ More
While sentiment and emotion analysis have been studied extensively, the relationship between sarcasm and emotion has largely remained unexplored. A sarcastic expression may have a variety of underlying emotions. For example, "I love being ignored" belies sadness, while "my mobile is fabulous with a battery backup of only 15 minutes!" expresses frustration. Detecting the emotion behind a sarcastic expression is non-trivial yet an important task. We undertake the task of detecting the emotion in a sarcastic statement, which to the best of our knowledge, is hitherto unexplored. We start with the recently released multimodal sarcasm detection dataset (MUStARD) pre-annotated with 9 emotions. We identify and correct 343 incorrect emotion labels (out of 690). We double the size of the dataset, label it with emotions along with valence and arousal which are important indicators of emotional intensity. Finally, we label each sarcastic utterance with one of the four sarcasm types-Propositional, Embedded, Likeprefixed and Illocutionary, with the goal of advancing sarcasm detection research. Exhaustive experimentation with multimodal (text, audio, and video) fusion models establishes a benchmark for exact emotion recognition in sarcasm and outperforms the state-of-art sarcasm detection. We release the dataset enriched with various annotations and the code for research purposes: https://github.com/apoorva-nunna/MUStARD_Plus_Plus
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
Multi-robot Task Assignment for Aerial Tracking with Viewpoint Constraints
Authors:
Aaron Ray,
Alyssa Pierson,
Hai Zhu,
Javier Alonso-Mora,
Daniela Rus
Abstract:
We address the problem of assigning a team of drones to autonomously capture a set desired shots of a dynamic target in the presence of obstacles. We present a two-stage planning pipeline that generates offline an assignment of drone to shots and locally optimizes online the viewpoint. Given desired shot parameters, the high-level planner uses a visibility heuristic to predict good times for captu…
▽ More
We address the problem of assigning a team of drones to autonomously capture a set desired shots of a dynamic target in the presence of obstacles. We present a two-stage planning pipeline that generates offline an assignment of drone to shots and locally optimizes online the viewpoint. Given desired shot parameters, the high-level planner uses a visibility heuristic to predict good times for capturing each shot and uses an Integer Linear Program to compute drone assignments. An online Model Predictive Control algorithm uses the assignments as reference to capture the shots. The algorithm is validated in hardware with a pair of drones and a remote controlled car.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Free-Space Ellipsoid Graphs for Multi-Agent Target Monitoring
Authors:
Aaron Ray,
Alyssa Pierson,
Daniela Rus
Abstract:
We apply a novel framework for decomposing and reasoning about free space in an environment to a multi-agent persistent monitoring problem. Our decomposition method represents free space as a collection of ellipsoids associated with a weighted connectivity graph. The same ellipsoids used for reasoning about connectivity and distance during high level planning can be used as state constraints in a…
▽ More
We apply a novel framework for decomposing and reasoning about free space in an environment to a multi-agent persistent monitoring problem. Our decomposition method represents free space as a collection of ellipsoids associated with a weighted connectivity graph. The same ellipsoids used for reasoning about connectivity and distance during high level planning can be used as state constraints in a Model Predictive Control algorithm to enforce collision-free motion. This structure allows for streamlined implementation in distributed multi-agent tasks in 2D and 3D environments. We illustrate its effectiveness for a team of tracking agents tasked with monitoring a group of target agents. Our algorithm uses the ellipsoid decomposition as a primitive for the coordination, path planning, and control of the tracking agents. Simulations with four tracking agents monitoring fifteen dynamic targets in obstacle-rich environments demonstrate the performance of our algorithm.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Training language models to follow instructions with human feedback
Authors:
Long Ouyang,
Jeff Wu,
Xu Jiang,
Diogo Almeida,
Carroll L. Wainwright,
Pamela Mishkin,
Chong Zhang,
Sandhini Agarwal,
Katarina Slama,
Alex Ray,
John Schulman,
Jacob Hilton,
Fraser Kelton,
Luke Miller,
Maddie Simens,
Amanda Askell,
Peter Welinder,
Paul Christiano,
Jan Leike,
Ryan Lowe
Abstract:
Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning wi…
▽ More
Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
Sublinear Time Approximation of Text Similarity Matrices
Authors:
Archan Ray,
Nicholas Monath,
Andrew McCallum,
Cameron Musco
Abstract:
We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $Ω(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this qua…
▽ More
We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $Ω(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix.
Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nyström method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity computations.
We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in the downstream tasks of document classification, sentence similarity, and cross-document coreference.
△ Less
Submitted 27 April, 2022; v1 submitted 17 December, 2021;
originally announced December 2021.
-
Improving Users' Mental Model with Attention-directed Counterfactual Edits
Authors:
Kamran Alipour,
Arijit Ray,
Xiao Lin,
Michael Cogswell,
Jurgen P. Schulze,
Yi Yao,
Giedrius T. Burachas
Abstract:
In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental model of the VQA system when they are exposed to examples of how these systems answer certain Image-Question (IQ) pairs. In this work, we show that showing controlled counterfactual image-question examples are more effective at improving the mental model of users as compared to simply showing random e…
▽ More
In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental model of the VQA system when they are exposed to examples of how these systems answer certain Image-Question (IQ) pairs. In this work, we show that showing controlled counterfactual image-question examples are more effective at improving the mental model of users as compared to simply showing random examples. We compare a generative approach and a retrieval-based approach to show counterfactual examples. We use recent advances in generative adversarial networks (GANs) to generate counterfactual images by deleting and inpainting certain regions of interest in the image. We then expose users to changes in the VQA system's answer on those altered images. To select the region of interest for inpainting, we experiment with using both human-annotated attention maps and a fully automatic method that uses the VQA system's attention values. Finally, we test the user's mental model by asking them to predict the model's performance on a test counterfactual image. We note an overall improvement in users' accuracy to predict answer change when shown counterfactual explanations. While realistic retrieved counterfactuals obviously are the most effective at improving the mental model, we show that a generative approach can also be equally effective.
△ Less
Submitted 15 October, 2021; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Unsupervised Neural Machine Translation with Generative Language Models Only
Authors:
Jesse Michael Han,
Igor Babuschkin,
Harrison Edwards,
Arvind Neelakantan,
Tao Xu,
Stanislas Polu,
Alex Ray,
Pranav Shyam,
Aditya Ramesh,
Alec Radford,
Ilya Sutskever
Abstract:
We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models. Our method consists of three steps: few-shot amplification, distillation, and backtranslation. We first use the zero-shot translation ability of large pre-trained language models to generate translations for a small set of unlabeled sentences. We then amplify these…
▽ More
We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models. Our method consists of three steps: few-shot amplification, distillation, and backtranslation. We first use the zero-shot translation ability of large pre-trained language models to generate translations for a small set of unlabeled sentences. We then amplify these zero-shot translations by using them as few-shot demonstrations for sampling a larger synthetic dataset. This dataset is distilled by discarding the few-shot demonstrations and then fine-tuning. During backtranslation, we repeatedly generate translations for a set of inputs and then fine-tune a single language model on both directions of the translation task at once, ensuring cycle-consistency by swapping the roles of gold monotext and generated translations when fine-tuning. By using our method to leverage GPT-3's zero-shot translation capability, we achieve a new state-of-the-art in unsupervised translation on the WMT14 English-French benchmark, attaining a BLEU score of 42.1.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Sublinear Time Eigenvalue Approximation via Random Sampling
Authors:
Rajarshi Bhattacharjee,
Gregory Dexter,
Petros Drineas,
Cameron Musco,
Archan Ray
Abstract:
We study the problem of approximating the eigenspectrum of a symmetric matrix $\mathbf A \in \mathbb{R}^{n \times n}$ with bounded entries (i.e., $\|\mathbf A\|_{\infty} \leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $\mathbf{A}$ up to additive error $\pm εn$ using those of a randomly sampled…
▽ More
We study the problem of approximating the eigenspectrum of a symmetric matrix $\mathbf A \in \mathbb{R}^{n \times n}$ with bounded entries (i.e., $\|\mathbf A\|_{\infty} \leq 1$). We present a simple sublinear time algorithm that approximates all eigenvalues of $\mathbf{A}$ up to additive error $\pm εn$ using those of a randomly sampled $\tilde {O}\left (\frac{\log^3 n}{ε^3}\right ) \times \tilde O\left (\frac{\log^3 n}{ε^3}\right )$ principal submatrix. Our result can be viewed as a concentration bound on the complete eigenspectrum of a random submatrix, significantly extending known bounds on just the singular values (the magnitudes of the eigenvalues). We give improved error bounds of $\pm ε\sqrt{\text{nnz}(\mathbf{A})}$ and $\pm ε\|\mathbf A\|_F$ when the rows of $\mathbf A$ can be sampled with probabilities proportional to their sparsities or their squared $\ell_2$ norms respectively. Here $\text{nnz}(\mathbf{A})$ is the number of non-zero entries in $\mathbf{A}$ and $\|\mathbf A\|_F$ is its Frobenius norm. Even for the strictly easier problems of approximating the singular values or testing the existence of large negative eigenvalues (Bakshi, Chepurko, and Jayaram, FOCS '20), our results are the first that take advantage of non-uniform sampling to give improved error bounds. From a technical perspective, our results require several new eigenvalue concentration and perturbation bounds for matrices with bounded entries. Our non-uniform sampling bounds require a new algorithmic approach, which judiciously zeroes out entries of a randomly sampled submatrix to reduce variance, before computing the eigenvalues of that submatrix as estimates for those of $\mathbf A$. We complement our theoretical results with numerical simulations, which demonstrate the effectiveness of our algorithms in practice.
△ Less
Submitted 21 July, 2022; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Evaluating Large Language Models Trained on Code
Authors:
Mark Chen,
Jerry Tworek,
Heewoo Jun,
Qiming Yuan,
Henrique Ponde de Oliveira Pinto,
Jared Kaplan,
Harri Edwards,
Yuri Burda,
Nicholas Joseph,
Greg Brockman,
Alex Ray,
Raul Puri,
Gretchen Krueger,
Michael Petrov,
Heidy Khlaaf,
Girish Sastry,
Pamela Mishkin,
Brooke Chan,
Scott Gray,
Nick Ryder,
Mikhail Pavlov,
Alethea Power,
Lukasz Kaiser,
Mohammad Bavarian,
Clemens Winter
, et al. (33 additional authors not shown)
Abstract:
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol…
▽ More
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
△ Less
Submitted 14 July, 2021; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Enhancing the Generalization for Intent Classification and Out-of-Domain Detection in SLU
Authors:
Yilin Shen,
Yen-Chang Hsu,
Avik Ray,
Hongxia Jin
Abstract:
Intent classification is a major task in spoken language understanding (SLU). Since most models are built with pre-collected in-domain (IND) training utterances, their ability to detect unsupported out-of-domain (OOD) utterances has a critical effect in practical use. Recent works have shown that using extra data and labels can improve the OOD detection performance, yet it could be costly to colle…
▽ More
Intent classification is a major task in spoken language understanding (SLU). Since most models are built with pre-collected in-domain (IND) training utterances, their ability to detect unsupported out-of-domain (OOD) utterances has a critical effect in practical use. Recent works have shown that using extra data and labels can improve the OOD detection performance, yet it could be costly to collect such data. This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection. Our method designs a novel domain-regularized module (DRM) to reduce the overconfident phenomenon of a vanilla classifier, achieving a better generalization in both cases. Besides, DRM can be used as a drop-in replacement for the last layer in any neural network-based intent classifier, providing a low-cost strategy for a significant improvement. The evaluation on four datasets shows that our method built on BERT and RoBERTa models achieves state-of-the-art performance against existing approaches and the strong baselines we created for the comparisons.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Closed-form Continuous-time Neural Models
Authors:
Ramin Hasani,
Mathias Lechner,
Alexander Amini,
Lucas Liebenwein,
Aaron Ray,
Max Tschaikowski,
Gerald Teschl,
Daniela Rus
Abstract:
Continuous-time neural processes are performant sequential decision-makers that are built by differential equations (DE). However, their expressive power when they are deployed on computers is bottlenecked by numerical DE solvers. This limitation has significantly slowed down the scaling and understanding of numerous natural physical phenomena such as the dynamics of nervous systems. Ideally, we w…
▽ More
Continuous-time neural processes are performant sequential decision-makers that are built by differential equations (DE). However, their expressive power when they are deployed on computers is bottlenecked by numerical DE solvers. This limitation has significantly slowed down the scaling and understanding of numerous natural physical phenomena such as the dynamics of nervous systems. Ideally, we would circumvent this bottleneck by solving the given dynamical system in closed form. This is known to be intractable in general. Here, we show it is possible to closely approximate the interaction between neurons and synapses -- the building blocks of natural and artificial neural networks -- constructed by liquid time-constant networks (LTCs) efficiently in closed-form. To this end, we compute a tightly-bounded approximation of the solution of an integral appearing in LTCs' dynamics, that has had no known closed-form solution so far. This closed-form solution substantially impacts the design of continuous-time and continuous-depth neural models; for instance, since time appears explicitly in closed-form, the formulation relaxes the need for complex numerical solvers. Consequently, we obtain models that are between one and five orders of magnitude faster in training and inference compared to differential equation-based counterparts. More importantly, in contrast to ODE-based continuous networks, closed-form networks can scale remarkably well compared to other deep learning instances. Lastly, as these models are derived from liquid networks, they show remarkable performance in time series modeling, compared to advanced recurrent models.
△ Less
Submitted 2 March, 2022; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Optimized ensemble deep learning framework for scalable forecasting of dynamics containing extreme events
Authors:
Arnob Ray,
Tanujit Chakraborty,
Dibakar Ghosh
Abstract:
The remarkable flexibility and adaptability of both deep learning models and ensemble methods have led to the proliferation for their application in understanding many physical phenomena. Traditionally, these two techniques have largely been treated as independent methodologies in practical applications. This study develops an optimized ensemble deep learning (OEDL) framework wherein these two mac…
▽ More
The remarkable flexibility and adaptability of both deep learning models and ensemble methods have led to the proliferation for their application in understanding many physical phenomena. Traditionally, these two techniques have largely been treated as independent methodologies in practical applications. This study develops an optimized ensemble deep learning (OEDL) framework wherein these two machine learning techniques are jointly used to achieve synergistic improvements in model accuracy, stability, scalability, and reproducibility prompting a new wave of applications in the forecasting of dynamics. Unpredictability is considered as one of the key features of chaotic dynamics, so forecasting such dynamics of nonlinear systems is a relevant issue in the scientific community. It becomes more challenging when the prediction of extreme events is the focus issue for us. In this circumstance, the proposed OEDL model based on a best convex combination of feed-forward neural networks, reservoir computing, and long short-term memory can play a key role in advancing predictions of dynamics consisting of extreme events. The combined framework can generate the best out-of-sample performance than the individual deep learners and standard ensemble framework for both numerically simulated and real world data sets. We exhibit the outstanding performance of the OEDL framework for forecasting extreme events generated from Lienard-type system, prediction of COVID-19 cases in Brazil, dengue cases in San Juan, and sea surface temperature in Nino 3.4 region.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
Bayesian Model Averaging for Data Driven Decision Making when Causality is Partially Known
Authors:
Marios Papamichalis,
Abhishek Ray,
Ilias Bilionis,
Karthik Kannan,
Rajiv Krishnamurthy
Abstract:
Probabilistic machine learning models are often insufficient to help with decisions on interventions because those models find correlations - not causal relationships. If observational data is only available and experimentation are infeasible, the correct approach to study the impact of an intervention is to invoke Pearl's causality framework. Even that framework assumes that the underlying causal…
▽ More
Probabilistic machine learning models are often insufficient to help with decisions on interventions because those models find correlations - not causal relationships. If observational data is only available and experimentation are infeasible, the correct approach to study the impact of an intervention is to invoke Pearl's causality framework. Even that framework assumes that the underlying causal graph is known, which is seldom the case in practice. When the causal structure is not known, one may use out-of-the-box algorithms to find causal dependencies from observational data. However, there exists no method that also accounts for the decision-maker's prior knowledge when developing the causal structure either. The objective of this paper is to develop rational approaches for making decisions from observational data in the presence of causal graph uncertainty and prior knowledge from the decision-maker. We use ensemble methods like Bayesian Model Averaging (BMA) to infer set of causal graphs that can represent the data generation process. We provide decisions by computing the expected value and risk of potential interventions explicitly. We demonstrate our approach by applying them in different example contexts.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
There is no APTAS for 2-dimensional vector bin packing: Revisited
Authors:
Arka Ray
Abstract:
We study the Vector Bin Packing and the Vector Bin Covering problems, multidimensional generalizations of the Bin Packing and the Bin Covering problems, respectively. In the Vector Bin Packing, we are given a set of $d$-dimensional vectors from $[0,1]^d$ and the aim is to partition the set into the minimum number of bins such that for each bin $B$, each component of the sum of the vectors in $B$ i…
▽ More
We study the Vector Bin Packing and the Vector Bin Covering problems, multidimensional generalizations of the Bin Packing and the Bin Covering problems, respectively. In the Vector Bin Packing, we are given a set of $d$-dimensional vectors from $[0,1]^d$ and the aim is to partition the set into the minimum number of bins such that for each bin $B$, each component of the sum of the vectors in $B$ is at most 1. Woeginger [Woe97] claimed that the problem has no APTAS for dimensions greater than or equal to 2. We note that there was a slight oversight in the original proof. In this work, we give a revised proof using some additional ideas from [BCKS06,CC09]. In fact, we show that it is NP-hard to get an asymptotic approximation ratio better than $\frac{600}{599}$.
An instance of Vector Bin Packing is called $δ$-skewed if every item has at most one dimension greater than $δ$. As a natural extension of our general $d$-Dimensional Vector Bin Packing result we show that for $\varepsilon\in (0,\frac{1}{2500})$ it is NP-hard to obtain a $(1+\varepsilon)$-approximation for $δ$-Skewed Vector Bin Packing if $δ>20\sqrt \varepsilon$.
In the Vector Bin Covering problem given a set of $d$-dimensional vectors from $[0,1]^d$, the aim is to obtain a family of disjoint subsets (called bins) with the maximum cardinality such that for each bin $B$, each component of the sum of the vectors in $B$ is at least 1. Using ideas similar to our Vector Bin Packing result, we show that for Vector Bin Covering there is no APTAS for dimensions greater than or equal to 2. In fact, we show that it is NP-hard to get an asymptotic approximation ratio better than $\frac{998}{997}$.
△ Less
Submitted 1 August, 2023; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models
Authors:
Arijit Ray,
Michael Cogswell,
Xiao Lin,
Kamran Alipour,
Ajay Divakaran,
Yi Yao,
Giedrius Burachas
Abstract:
Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that users are often misled by current attention map visualizations that point to relevant regions despite the model producing an incorrect answer. Hence, we…
▽ More
Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that users are often misled by current attention map visualizations that point to relevant regions despite the model producing an incorrect answer. Hence, we propose Error Maps that clarify the error by highlighting image regions where the model is prone to err. Error maps can indicate when a correctly attended region may be processed incorrectly leading to an incorrect answer, and hence, improve users' understanding of those cases. To evaluate our new explanations, we further introduce a metric that simulates users' interpretation of explanations to evaluate their potential helpfulness to understand model correctness. We finally conduct user studies to see that our new explanations help users understand model correctness better than baselines by an expected 30\% and that our proxy helpfulness metrics correlate strongly ($ρ>0.97$) with how well users can predict model correctness.
△ Less
Submitted 25 October, 2021; v1 submitted 26 March, 2021;
originally announced March 2021.
-
Oscillatory Residual Stresses in Steady Angular Channel Extrusion
Authors:
Arunava Ray,
Pritam Chakraborty,
Anindya Chatterjee
Abstract:
Angular channel extrusion has evolved as processes that can induce significant strengthening of the formed product through grain refinement. However, significant residual stresses are developed in the extruded product whose quantification is necessary for accurate process design and subsequent heat treatment. Experimental evaluation of residual stress provides the through thickness (normal) variat…
▽ More
Angular channel extrusion has evolved as processes that can induce significant strengthening of the formed product through grain refinement. However, significant residual stresses are developed in the extruded product whose quantification is necessary for accurate process design and subsequent heat treatment. Experimental evaluation of residual stress provides the through thickness (normal) variation at chosen sampling points on the formed product and may provide inaccurate estimates if variations along the extrusion (longitudinal) directions are present. Process models can complement the experimental measurements and improve the estimates of residual stress distribution. While models of this process have been developed, very few of them have been applied to understand the variation of residual stress in the formed products. The present work aims to address this limitation by providing a complete map of residual stress distribution in angular extrusion process through numerical simulations. Interestingly, our simulations show that the angular channel extruded product can have significant longitudinal variation of residual stress depending on the extrusion ratio and strain hardening rate. Detailed analyses of the process reveals that these spatial oscillations occur due to cyclic movement of the contact location between the die and the top-billet surface in the exit channel. The outcome of this study suggests that accurate measurement technique of residual stress field in angular channel extruded products should consider the possibility of longitudinal variations. The findings can be extended to other continuous forming processes as well.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
Generating Dialogue Responses from a Semantic Latent Space
Authors:
Wei-Jen Ko,
Avik Ray,
Yilin Shen,
Hongxia Jin
Abstract:
Existing open-domain dialogue generation models are usually trained to mimic the gold response in the training set using cross-entropy loss on the vocabulary. However, a good response does not need to resemble the gold response, since there are multiple possible responses to a given prompt. In this work, we hypothesize that the current models are unable to integrate information from multiple seman…
▽ More
Existing open-domain dialogue generation models are usually trained to mimic the gold response in the training set using cross-entropy loss on the vocabulary. However, a good response does not need to resemble the gold response, since there are multiple possible responses to a given prompt. In this work, we hypothesize that the current models are unable to integrate information from multiple semantically similar valid responses of a prompt, resulting in the generation of generic and uninformative responses. To address this issue, we propose an alternative to the end-to-end classification on vocabulary. We learn the pair relationship between the prompts and responses as a regression task on a latent space instead. In our novel dialog generation model, the representations of semantically related sentences are close to each other on the latent space. Human evaluation showed that learning the task on a continuous space can generate responses that are both relevant and informative.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
Development and Testing of a Novel Automated Insect Capture Module for Sample Collection and Transfer
Authors:
Keran Ye,
Gustavo J. Correa,
Tom Guda,
Hanzhe Teng,
Anandasankar Ray,
Konstantinos Karydis
Abstract:
There exists an urgent need for efficient tools in disease surveillance to help model and predict the spread of disease. The transmission of insect-borne diseases poses a serious concern to public health officials and the medical and research community at large. In the modeling of this spread, we face bottlenecks in (1) the frequency at which we are able to sample insect vectors in environments th…
▽ More
There exists an urgent need for efficient tools in disease surveillance to help model and predict the spread of disease. The transmission of insect-borne diseases poses a serious concern to public health officials and the medical and research community at large. In the modeling of this spread, we face bottlenecks in (1) the frequency at which we are able to sample insect vectors in environments that are prone to propagating disease, (2) manual labor needed to set up and retrieve surveillance devices like traps, and (3) the return time in analyzing insect samples and determining if an infectious disease is spreading in a region. To help address these bottlenecks, we present in this paper the design, fabrication, and testing of a novel automated insect capture module (ICM) or trap that aims to improve the rate of transferring samples collected from the environment via aerial robots. The ICM features an ultraviolet light attractant, passive capture mechanism, panels which can open and close for access to insects, and a small onboard computer for automated operation and data logging. At the same time, the ICM is designed to be accessible; it is small-scale, lightweight and low-cost, and can be integrated with commercially available aerial robots. Indoor and outdoor experimentation validates ICM's feasibility in insect capturing and safe transportation. The device can help bring us one step closer toward achieving fully autonomous and scalable epidemiology by leveraging autonomous robots technology to aid the medical and research community.
△ Less
Submitted 29 August, 2020;
originally announced August 2020.
-
The Impact of Explanations on AI Competency Prediction in VQA
Authors:
Kamran Alipour,
Arijit Ray,
Xiao Lin,
Jurgen P. Schulze,
Yi Yao,
Giedrius T. Burachas
Abstract:
Explainability is one of the key elements for building trust in AI systems. Among numerous attempts to make AI explainable, quantifying the effect of explanations remains a challenge in conducting human-AI collaborative tasks. Aside from the ability to predict the overall behavior of AI, in many applications, users need to understand an AI agent's competency in different aspects of the task domain…
▽ More
Explainability is one of the key elements for building trust in AI systems. Among numerous attempts to make AI explainable, quantifying the effect of explanations remains a challenge in conducting human-AI collaborative tasks. Aside from the ability to predict the overall behavior of AI, in many applications, users need to understand an AI agent's competency in different aspects of the task domain. In this paper, we evaluate the impact of explanations on the user's mental model of AI agent competency within the task of visual question answering (VQA). We quantify users' understanding of competency, based on the correlation between the actual system performance and user rankings. We introduce an explainable VQA system that uses spatial and object features and is powered by the BERT language model. Each group of users sees only one kind of explanation to rank the competencies of the VQA model. The proposed model is evaluated through between-subject experiments to probe explanations' impact on the user's perception of competency. The comparison between two VQA models shows BERT based explanations and the use of object features improve the user's prediction of the model's competencies.
△ Less
Submitted 2 July, 2020;
originally announced July 2020.
-
3D CA model of tumor-induced angiogenesis
Authors:
Monjoy Saha,
Amit Kumar Ray,
Swapan Kumar Basu
Abstract:
Tumor-induced angiogenesis is the formation of new sprouts from preexisting nearby parent blood vessels. Computationally, tumor-induced angiogenesis can be modeled using cellular automata (CA), partial differential equations, etc. In this present study, a realistic physiological approach has been made to model the process of angiogenesis by using 3D CA model. CA technique uses various neighborhood…
▽ More
Tumor-induced angiogenesis is the formation of new sprouts from preexisting nearby parent blood vessels. Computationally, tumor-induced angiogenesis can be modeled using cellular automata (CA), partial differential equations, etc. In this present study, a realistic physiological approach has been made to model the process of angiogenesis by using 3D CA model. CA technique uses various neighborhoods like Von-Neumann neighborhood, Moore neighborhood, and Margolus neighborhood. In our model Von-Neumann neighborhood has used for distribution of some significant chemical and non-chemical tumor angiogenic factors like vascular endothelial growth factor, endothelial cells, O2, extracellular matrix, fibronectin, etc., and Moore neighborhood is used for distribution of matrix metalloproteinase. In vivo tumor environment all the factors are not distributed equally in the extracellular matrix. Distributions of those chemical and nonchemical factors depend on their source, nature and function. To keep similarity with the biological tumor environment, we have formulated initial distributions of the chemical and non-chemical factors accordingly. We have started the simulation in MATLAB with this initial distribution. Number of sprouts randomly varies from one run to another. We observed that sprouts are not originating from the same locations in each simulation. A sprout has high sensitivity of VEGF and fibronectin concentrations. sVEGFR-1 always tries to regress the sprout. When two or more sprouts come closer, they merge with each other leading to anastomosis. Sufficient number of tip cells may cause sprout towards tumor.
△ Less
Submitted 24 May, 2020;
originally announced May 2020.
-
SIBRE: Self Improvement Based REwards for Adaptive Feedback in Reinforcement Learning
Authors:
Somjit Nath,
Richa Verma,
Abhik Ray,
Harshad Khadilkar
Abstract:
We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL), called Self Improvement Based REwards, or SIBRE. The approach is designed for use in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance. We prove that SIBRE converges in expectation under the same conditions as the o…
▽ More
We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL), called Self Improvement Based REwards, or SIBRE. The approach is designed for use in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance. We prove that SIBRE converges in expectation under the same conditions as the original RL algorithm. The reshaped rewards help discriminate between policies when the original rewards are weakly discriminated or sparse. Experiments on several well-known benchmark environments with different RL algorithms show that SIBRE converges to the optimal policy faster and more stably. We also perform sensitivity analysis with respect to hyper-parameters, in comparison with baseline RL algorithms.
△ Less
Submitted 21 December, 2020; v1 submitted 21 April, 2020;
originally announced April 2020.
-
A Dynamically Controlled Recurrent Neural Network for Modeling Dynamical Systems
Authors:
Yiwei Fu,
Samer Saab Jr,
Asok Ray,
Michael Hauser
Abstract:
This work proposes a novel neural network architecture, called the Dynamically Controlled Recurrent Neural Network (DCRNN), specifically designed to model dynamical systems that are governed by ordinary differential equations (ODEs). The current state vectors of these types of dynamical systems only depend on their state-space models, along with the respective inputs and initial conditions. Long S…
▽ More
This work proposes a novel neural network architecture, called the Dynamically Controlled Recurrent Neural Network (DCRNN), specifically designed to model dynamical systems that are governed by ordinary differential equations (ODEs). The current state vectors of these types of dynamical systems only depend on their state-space models, along with the respective inputs and initial conditions. Long Short-Term Memory (LSTM) networks, which have proven to be very effective for memory-based tasks, may fail to model physical processes as they tend to memorize, rather than learn how to capture the information on the underlying dynamics. The proposed DCRNN includes learnable skip-connections across previously hidden states, and introduces a regularization term in the loss function by relying on Lyapunov stability theory. The regularizer enables the placement of eigenvalues of the transfer function induced by the DCRNN to desired values, thereby acting as an internal controller for the hidden state trajectory. The results show that, for forecasting a chaotic dynamical system, the DCRNN outperforms the LSTM in $100$ out of $100$ randomized experiments by reducing the mean squared error of the LSTM's forecasting by $80.0\% \pm 3.0\%$.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.
-
Iterative Delexicalization for Improved Spoken Language Understanding
Authors:
Avik Ray,
Yilin Shen,
Hongxia Jin
Abstract:
Recurrent neural network (RNN) based joint intent classification and slot tagging models have achieved tremendous success in recent years for building spoken language understanding and dialog systems. However, these models suffer from poor performance for slots which often encounter large semantic variability in slot values after deployment (e.g. message texts, partial movie/artist names). While g…
▽ More
Recurrent neural network (RNN) based joint intent classification and slot tagging models have achieved tremendous success in recent years for building spoken language understanding and dialog systems. However, these models suffer from poor performance for slots which often encounter large semantic variability in slot values after deployment (e.g. message texts, partial movie/artist names). While greedy delexicalization of slots in the input utterance via substring matching can partly improve performance, it often produces incorrect input. Moreover, such techniques cannot delexicalize slots with out-of-vocabulary slot values not seen at training. In this paper, we propose a novel iterative delexicalization algorithm, which can accurately delexicalize the input, even with out-of-vocabulary slot values. Based on model confidence of the current delexicalized input, our algorithm improves delexicalization in every iteration to converge to the best input having the highest confidence. We show on benchmark and in-house datasets that our algorithm can greatly improve parsing performance for RNN based models, especially for out-of-distribution slot values.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Authors:
Arijit Ray,
Karan Sikka,
Ajay Divakaran,
Stefan Lee,
Giedrius Burachas
Abstract:
While models for Visual Question Answering (VQA) have steadily improved over the years, interacting with one quickly reveals that these models lack consistency. For instance, if a model answers "red" to "What color is the balloon?", it might answer "no" if asked, "Is the balloon red?". These responses violate simple notions of entailment and raise questions about how effectively VQA models ground…
▽ More
While models for Visual Question Answering (VQA) have steadily improved over the years, interacting with one quickly reveals that these models lack consistency. For instance, if a model answers "red" to "What color is the balloon?", it might answer "no" if asked, "Is the balloon red?". These responses violate simple notions of entailment and raise questions about how effectively VQA models ground language. In this work, we introduce a dataset, ConVQA, and metrics that enable quantitative evaluation of consistency in VQA. For a given observable fact in an image (e.g. the balloon's color), we generate a set of logically consistent question-answer (QA) pairs (e.g. Is the balloon red?) and also collect a human-annotated set of common-sense based consistent QA pairs (e.g. Is the balloon the same color as tomato sauce?). Further, we propose a consistency-improving data augmentation module, a Consistency Teacher Module (CTM). CTM automatically generates entailed (or similar-intent) questions for a source QA pair and fine-tunes the VQA model if the VQA's answer to the entailed question is consistent with the source QA pair. We demonstrate that our CTM-based training improves the consistency of VQA models on the ConVQA datasets and is a strong baseline for further research.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.