subscribe to arXiv mailings

arXiv:2406.09810 [pdf, other]

Think Deep and Fast: Learning Neural Nonlinear Opinion Dynamics from Inverse Dynamic Games for Split-Second Interactions

Authors: Haimin Hu, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is… ▽ More Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is yet to be fully addressed. The recently developed nonlinear opinion dynamics (NOD) show promise in enabling fast opinion formation and avoiding safety-critical deadlocks. However, it remains an open challenge to determine the model parameters of NOD automatically and adaptively, accounting for the ever-changing environment of interaction. In this work, we propose for the first time a learning-based, game-theoretic approach to synthesize a Neural NOD model from expert demonstrations, given as a dataset containing (possibly incomplete) state and action trajectories of interacting agents. The learned NOD can be used by existing dynamic game solvers to plan decisively while accounting for the predicted change of other agents' intents, thus enabling situational awareness in planning. We demonstrate Neural NOD's ability to make fast and robust decisions in a simulated autonomous racing example, leading to tangible improvements in safety and overtaking performance over state-of-the-art data-driven game-theoretic planning methods. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2405.09794 [pdf, other]

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Authors: Andrea Bajcsy, Jaime F. Fisac

Abstract: Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality… ▽ More Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human behavior may drive the interaction towards different outcomes. To this end, we introduce a unifying formalism to capture dynamic, safety-critical human--AI interactions and propose a concrete technical roadmap towards next-generation human-centered AI safety. △ Less

Submitted 22 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: Revised version with refined exposition and technical details. 12 pages + references, 5 figures

ACM Class: I.2

arXiv:2405.00846 [pdf, other]

Gameplay Filters: Safe Robot Walking through Adversarial Imagination

Authors: Duy P. Nguyen, Kai-Chieh Hsu, Wenhao Yu, Jie Tan, Jaime F. Fisac

Abstract: Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This… ▽ More Ensuring the safe operation of legged robots in uncertain, novel environments is crucial to their widespread adoption. Despite recent advances in safety filters that can keep arbitrary task-driven policies from incurring safety failures, existing solutions for legged robot locomotion still rely on simplified dynamics and may fail when the robot is perturbed away from predefined stable gaits. This paper presents a general approach that leverages offline game-theoretic reinforcement learning to synthesize a highly robust safety filter for high-order nonlinear dynamics. This gameplay filter then maintains runtime safety by continually simulating adversarial futures and precluding task-driven actions that would cause it to lose future games (and thereby violate safety). Validated on a 36-dimensional quadruped robot locomotion task, the gameplay safety filter exhibits inherent robustness to the sim-to-real gap without manual tuning or heuristic designs. Physical experiments demonstrate the effectiveness of the gameplay safety filter under perturbations, such as tugging and unmodeled irregular terrains, while simulation studies shed light on how to trade off computation and conservativeness without compromising safety. △ Less

Submitted 31 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

arXiv:2404.02524 [pdf, other]

Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion

Authors: Zhiyu Huang, Zixu Zhang, Ameya Vaidya, Yuxiao Chen, Chen Lv, Jaime Fernández Fisac

Abstract: Generating realistic and controllable agent behaviors in traffic simulation is crucial for the development of autonomous vehicles. This problem is often formulated as imitation learning (IL) from real-world driving data by either directly predicting future trajectories or inferring cost functions with inverse optimal control. In this paper, we draw a conceptual connection between IL and diffusion-… ▽ More Generating realistic and controllable agent behaviors in traffic simulation is crucial for the development of autonomous vehicles. This problem is often formulated as imitation learning (IL) from real-world driving data by either directly predicting future trajectories or inferring cost functions with inverse optimal control. In this paper, we draw a conceptual connection between IL and diffusion-based generative modeling and introduce a novel framework Versatile Behavior Diffusion (VBD) to simulate interactive scenarios with multiple traffic participants. Our model not only generates scene-consistent multi-agent interactions but also enables scenario editing through multi-step guidance and refinement. Experimental evaluations show that VBD achieves state-of-the-art performance on the Waymo Sim Agents benchmark. In addition, we illustrate the versatility of our model by adapting it to various applications. VBD is capable of producing scenarios conditioning on priors, integrating with model-based optimization, sampling multi-modal scene-consistent scenarios by fusing marginal predictions, and generating safety-critical scenarios when combined with a game-theoretic solver. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2402.14174 [pdf, other]

Blending Data-Driven Priors in Dynamic Games

Authors: Justin Lidard, Haimin Hu, Asher Hancock, Zixu Zhang, Albert Gimó Contreras, Vikash Modi, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, María Santos, Jaime Fernández Fisac

Abstract: As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, h… ▽ More As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/. △ Less

Submitted 6 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 20 pages, 12 figures

arXiv:2402.09246 [pdf, other]

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

Authors: Haimin Hu, Gabriele Dragotto, Zixu Zhang, Kaiqu Liang, Bartolomeo Stellato, Jaime F. Fisac

Abstract: We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutat… ▽ More We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutations. To solve the problem, we introduce Branch and Play (B&P), an efficient and exact algorithm that provably converges to a socially optimal order of play and its Stackelberg equilibrium. As a subroutine for B&P, we employ and extend sequential trajectory planning, i.e., a popular multi-agent control approach, to scalably compute valid local Stackelberg equilibria for any given order of play. We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets. We find that B&P consistently outperforms various baselines, and computes the socially optimal equilibrium. △ Less

Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: Robotics: Science and Systems (RSS) 2024

arXiv:2402.06529 [pdf, other]

Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity

Authors: Kaiqu Liang, Zixu Zhang, Jaime Fernández Fisac

Abstract: Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or, in extreme cases, unsafe. Additionally, inherent ambiguity in natural language instructions… ▽ More Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or, in extreme cases, unsafe. Additionally, inherent ambiguity in natural language instructions can induce task uncertainty, particularly in situations where multiple valid options exist. To address this issue, LLMs must identify such uncertainty and proactively seek clarification. This paper explores the concept of introspective planning as a systematic method for guiding LLMs in forming uncertainty--aware plans for robotic task execution without the need for fine-tuning. We investigate uncertainty quantification in task-level robot planning and demonstrate that introspection significantly improves both success rates and safety compared to state-of-the-art LLM-based planning approaches. Furthermore, we assess the effectiveness of introspective planning in conjunction with conformal prediction, revealing that this combination yields tighter confidence bounds, thereby maintaining statistical success guarantees with fewer superfluous user clarification queries. Code is available at https://github.com/kevinliang888/IntroPlan. △ Less

Submitted 3 June, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

Comments: 27 pages, 11 figures. Code is available at https://github.com/kevinliang888/IntroPlan

arXiv:2309.05837 [pdf, other]

The Safety Filter: A Unified View of Safety-Critical Control in Autonomous Systems

Authors: Kai-Chieh Hsu, Haimin Hu, Jaime Fernández Fisac

Abstract: Recent years have seen significant progress in the realm of robot autonomy, accompanied by the expanding reach of robotic technologies. However, the emergence of new deployment domains brings unprecedented challenges in ensuring safe operation of these systems, which remains as crucial as ever. While traditional model-based safe control methods struggle with generalizability and scalability, emerg… ▽ More Recent years have seen significant progress in the realm of robot autonomy, accompanied by the expanding reach of robotic technologies. However, the emergence of new deployment domains brings unprecedented challenges in ensuring safe operation of these systems, which remains as crucial as ever. While traditional model-based safe control methods struggle with generalizability and scalability, emerging data-driven approaches tend to lack well-understood guarantees, which can result in unpredictable catastrophic failures. Successful deployment of the next generation of autonomous robots will require integrating the strengths of both paradigms. This article provides a review of safety filter approaches, highlighting important connections between existing techniques and proposing a unified technical framework to understand, compare, and combine them. The new unified view exposes a shared modular structure across a range of seemingly disparate safety filter classes and naturally suggests directions for future progress towards more scalable synthesis, robust monitoring, and efficient intervention. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted for publication in Annual Review of Control, Robotics, and Autonomous Systems

arXiv:2309.01267 [pdf, other]

Deception Game: Closing the Safety-Learning Loop in Interactive Robot Autonomy

Authors: Haimin Hu, Zixu Zhang, Kensuke Nakamura, Andrea Bajcsy, Jaime F. Fisac

Abstract: An outstanding challenge for the widespread deployment of robotic systems like autonomous vehicles is ensuring safe interaction with humans without sacrificing performance. Existing safety methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior. This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explic… ▽ More An outstanding challenge for the widespread deployment of robotic systems like autonomous vehicles is ensuring safe interaction with humans without sacrificing performance. Existing safety methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior. This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explicitly account for the robot's evolving uncertainty and its ability to quickly respond to future scenarios as they arise, by jointly considering the physical dynamics and the robot's learning algorithm. We leverage adversarial reinforcement learning for tractable safety analysis under high-dimensional learning dynamics and demonstrate our framework's ability to work with both Bayesian belief propagation and implicit learning through large pre-trained neural trajectory predictors. △ Less

Submitted 1 November, 2023; v1 submitted 3 September, 2023; originally announced September 2023.

Comments: Conference on Robot Learning 2023

arXiv:2307.00193 [pdf, other]

Fast, Smooth, and Safe: Implicit Control Barrier Functions through Reach-Avoid Differential Dynamic Programming

Authors: Athindran Ramesh Kumar, Kai-Chieh Hsu, Peter J. Ramadge, Jaime F. Fisac

Abstract: Safety is a central requirement for autonomous system operation across domains. Hamilton-Jacobi (HJ) reachability analysis can be used to construct "least-restrictive" safety filters that result in infrequent, but often extreme, control overrides. In contrast, control barrier function (CBF) methods apply smooth control corrections to guard the system against an often conservative safety boundary.… ▽ More Safety is a central requirement for autonomous system operation across domains. Hamilton-Jacobi (HJ) reachability analysis can be used to construct "least-restrictive" safety filters that result in infrequent, but often extreme, control overrides. In contrast, control barrier function (CBF) methods apply smooth control corrections to guard the system against an often conservative safety boundary. This paper provides an online scheme to construct an implicit CBF through HJ reach-avoid differential dynamic programming in a receding-horizon framework, enabling smooth safety filtering with infinite-time safety guarantees. Simulations with the Dubins car and 5D bicycle dynamics demonstrate the scheme's ability to preserve safety smoothly without the conservativeness of handcrafted CBFs. △ Less

Submitted 30 June, 2023; originally announced July 2023.

Comments: Accepted in IEEE Control Systems Letters (L-CSS)

arXiv:2304.02687 [pdf, other]

Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics

Authors: Haimin Hu, Kensuke Nakamura, Kai-Chieh Hsu, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to uns… ▽ More We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2302.00171 [pdf, other]

Active Uncertainty Reduction for Safe and Efficient Interaction Planning: A Shielding-Aware Dual Control Approach

Authors: Haimin Hu, David Isele, Sangjae Bae, Jaime F. Fisac

Abstract: The ability to accurately predict others' behavior is central to the safety and efficiency of interactive robotics. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as other agents' goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidde… ▽ More The ability to accurately predict others' behavior is central to the safety and efficiency of interactive robotics. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as other agents' goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we use a runtime safety filter (also referred to as a "shielding" scheme), which overrides the robot's dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability agent behaviors. We demonstrate the efficacy of our approach with both simulated driving studies and hardware experiments using 1/10 scale autonomous vehicles. △ Less

Submitted 1 November, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: The International Journal of Robotics Research. arXiv admin note: text overlap with arXiv:2202.07720

arXiv:2212.03228 [pdf, other]

ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

Authors: Kai-Chieh Hsu, Duy Phuong Nguyen, Jaime Fernández Fisac

Abstract: The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit li… ▽ More The deployment of robots in uncontrolled environments requires them to operate robustly under previously unseen scenarios, like irregular terrain and wind conditions. Unfortunately, while rigorous safety frameworks from robust optimal control theory scale poorly to high-dimensional nonlinear dynamics, control policies computed by more tractable "deep" methods lack guarantees and tend to exhibit little robustness to uncertain operating conditions. This work introduces a novel approach enabling scalable synthesis of robust safety-preserving controllers for robotic systems with general nonlinear dynamics subject to bounded modeling error by combining game-theoretic safety analysis with adversarial reinforcement learning in simulation. Following a soft actor-critic scheme, a safety-seeking fallback policy is co-trained with an adversarial "disturbance" agent that aims to invoke the worst-case realization of model error and training-to-deployment discrepancy allowed by the designer's uncertainty. While the learned control policy does not intrinsically guarantee safety, it is used to construct a real-time safety filter (or shield) with robust safety guarantees based on forward reachability rollouts. This shield can be used in conjunction with a safety-agnostic control policy, precluding any task-driven actions that could result in loss of safety. We evaluate our learning-based safety approach in a 5D race car simulator, compare the learned safety policy to the numerically obtained optimal solution, and empirically validate the robust safety guarantee of our proposed safety shield against worst-case model discrepancy. △ Less

Submitted 7 June, 2024; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: Accepted in 5th Annual Learning for Dynamics & Control Conference (L4DC), University of Pennsylvania

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2202.07720 [pdf, other]

Active Uncertainty Reduction for Human-Robot Interaction: An Implicit Dual Control Approach

Authors: Haimin Hu, Jaime F. Fisac

Abstract: The ability to accurately predict human behavior is central to the safety and efficiency of robot autonomy in interactive settings. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as people's goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stoch… ▽ More The ability to accurately predict human behavior is central to the safety and efficiency of robot autonomy in interactive settings. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as people's goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning, mainly due to the fundamental coupling between robot trajectory optimization and human intent inference. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive human models with both continuous and categorical uncertainty. The efficacy of our approach is demonstrated with simulated driving examples. △ Less

Submitted 5 June, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: Workshop on the Algorithmic Foundations of Robotics (WAFR) 2022

Journal ref: 15th International Workshop on the Algorithmic Foundations of Robotics (WAFR) 2022

arXiv:2201.08355 [pdf, other]

doi 10.1016/j.artint.2022.103811

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Authors: Kai-Chieh Hsu, Allen Z. Ren, Duy Phuong Nguyen, Anirudha Majumdar, Jaime F. Fisac

Abstract: Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy di… ▽ More Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to bridge the reality gap with a probabilistically guaranteed safety-aware policy distribution. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the Safety Bellman Equation based on Hamilton-Jacobi (HJ) reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. Additionally, inheriting from the HJ reachability analysis, the bound accounts for the expectation over the worst-case safety in each environment. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments with varying degrees of photorealism. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See https://sites.google.com/princeton.edu/sim-to-lab-to-real for supplementary material. △ Less

Submitted 1 April, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted to Special Issue on Risk-aware Autonomous Systems: Theory and Practice, Artificial Intelligence

arXiv:2112.12288 [pdf, other]

doi 10.15607/RSS.2021.XVII.077

Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning

Authors: Kai-Chieh Hsu, Vicenç Rubies-Royo, Claire J. Tomlin, Jaime F. Fisac

Abstract: Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems… ▽ More Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem, yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods, retaining zero-violation guarantees by treating the approximate solutions as untrusted oracles in a model-predictive supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems, validating the results against analytic and numerical solutions, and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior, with applications across robotics and automation. See https://github.com/SafeRoboticsLab/safety_rl for code and supplementary material. △ Less

Submitted 22 December, 2021; originally announced December 2021.

Comments: Accepted in Robotics: Science and Systems (RSS), 2021

arXiv:2112.12210 [pdf, other]

ProBF: Learning Probabilistic Safety Certificates with Barrier Functions

Authors: Athindran Ramesh Kumar, Sulin Liu, Jaime F. Fisac, Ryan P. Adams, Peter J. Ramadge

Abstract: Safety-critical applications require controllers/policies that can guarantee safety with high confidence. The control barrier function is a useful tool to guarantee safety if we have access to the ground-truth system dynamics. In practice, we have inaccurate knowledge of the system dynamics, which can lead to unsafe behaviors due to unmodeled residual dynamics. Learning the residual dynamics with… ▽ More Safety-critical applications require controllers/policies that can guarantee safety with high confidence. The control barrier function is a useful tool to guarantee safety if we have access to the ground-truth system dynamics. In practice, we have inaccurate knowledge of the system dynamics, which can lead to unsafe behaviors due to unmodeled residual dynamics. Learning the residual dynamics with deterministic machine learning models can prevent the unsafe behavior but can fail when the predictions are imperfect. In this situation, a probabilistic learning method that reasons about the uncertainty of its predictions can help provide robust safety margins. In this work, we use a Gaussian process to model the projection of the residual dynamics onto a control barrier function. We propose a novel optimization procedure to generate safe controls that can guarantee safety with high probability. The safety filter is provided with the ability to reason about the uncertainty of the predictions from the GP. We show the efficacy of this method through experiments on Segway and Quadrotor simulations. Our proposed probabilistic approach is able to reduce the number of safety violations significantly as compared to the deterministic approach with a neural network. △ Less

Submitted 23 December, 2021; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: Presented at NeurIPS 2021 workshop - Safe and Robust Control of Uncertain Systems

arXiv:2110.00843 [pdf, other]

SHARP: Shielding-Aware Robust Planning for Safe and Efficient Human-Robot Interaction

Authors: Haimin Hu, Kensuke Nakamura, Jaime F. Fisac

Abstract: Jointly achieving safety and efficiency in human-robot interaction (HRI) settings is a challenging problem, as the robot's planning objectives may be at odds with the human's own intent and expectations. Recent approaches ensure safe robot operation in uncertain environments through a supervisory control scheme, sometimes called "shielding", which overrides the robot's nominal plan with a safety f… ▽ More Jointly achieving safety and efficiency in human-robot interaction (HRI) settings is a challenging problem, as the robot's planning objectives may be at odds with the human's own intent and expectations. Recent approaches ensure safe robot operation in uncertain environments through a supervisory control scheme, sometimes called "shielding", which overrides the robot's nominal plan with a safety fallback strategy when a safety-critical event is imminent. These reactive "last-resort" strategies (typically in the form of aggressive emergency maneuvers) focus on preserving safety without efficiency considerations; when the nominal planner is unaware of possible safety overrides, shielding can be activated more frequently than necessary, leading to degraded performance. In this work, we propose a new shielding-based planning approach that allows the robot to plan efficiently by explicitly accounting for possible future shielding events. Leveraging recent work on Bayesian human motion prediction, the resulting robot policy proactively balances nominal performance with the risk of high-cost emergency maneuvers triggered by low-probability human behaviors. We formalize Shielding-Aware Robust Planning (SHARP) as a stochastic optimal control problem and propose a computationally efficient framework for finding tractable approximate solutions at runtime. Our method outperforms the shielding-agnostic motion planning baseline (equipped with the same human intent inference scheme) on simulated driving examples with human trajectories taken from the recently released Waymo Open Motion Dataset. △ Less

Submitted 10 March, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

arXiv:2109.07673 [pdf, other]

Back to the Future: Efficient, Time-Consistent Solutions in Reach-Avoid Games

Authors: Dennis R. Anthony, Duy P. Nguyen, David Fridovich-Keil, Jaime F. Fisac

Abstract: We study the class of reach-avoid dynamic games in which multiple agents interact noncooperatively, and each wishes to satisfy a distinct target criterion while avoiding a failure criterion. Reach-avoid games are commonly used to express safety-critical optimal control problems found in mobile robot motion planning. Here, we focus on finding time-consistent solutions, in which future motion plans… ▽ More We study the class of reach-avoid dynamic games in which multiple agents interact noncooperatively, and each wishes to satisfy a distinct target criterion while avoiding a failure criterion. Reach-avoid games are commonly used to express safety-critical optimal control problems found in mobile robot motion planning. Here, we focus on finding time-consistent solutions, in which future motion plans remain optimal even when a robot diverges from the plan early on due to, e.g., intrinsic dynamic uncertainty or extrinsic environment disturbances. Our main contribution is a computationally-efficient algorithm for multi-agent reach-avoid games which renders time-consistent solutions for all players. We demonstrate our approach in two- and three-player simulated driving scenarios, in which our method provides safe control strategies for all agents. △ Less

Submitted 2 March, 2022; v1 submitted 15 September, 2021; originally announced September 2021.

Comments: accepted to ICRA 2022

arXiv:2105.08169 [pdf, other]

doi 10.15607/RSS.2021.XVII.066

Safe Occlusion-aware Autonomous Driving via Game-Theoretic Active Perception

Authors: Zixu Zhang, Jaime F. Fisac

Abstract: Autonomous vehicles interacting with other traffic participants heavily rely on the perception and prediction of other agents' behaviors to plan safe trajectories. However, as occlusions limit the vehicle's perception ability, reasoning about potential hazards beyond the field of view is one of the most challenging issues in developing autonomous driving systems. This paper introduces a novel anal… ▽ More Autonomous vehicles interacting with other traffic participants heavily rely on the perception and prediction of other agents' behaviors to plan safe trajectories. However, as occlusions limit the vehicle's perception ability, reasoning about potential hazards beyond the field of view is one of the most challenging issues in developing autonomous driving systems. This paper introduces a novel analytical approach that poses safe trajectory planning under occlusions as a hybrid zero-sum dynamic game between the autonomous vehicle (evader) and an initially hidden traffic participant (pursuer). Due to occlusions, the pursuer's state is initially unknown to the evader and may later be discovered by the vehicle's sensors. The analysis yields optimal strategies for both players as well as the set of initial conditions from which the autonomous vehicle is guaranteed to avoid collisions. We leverage this theoretical result to develop a novel trajectory planning framework for autonomous driving that provides worst-case safety guarantees while minimizing conservativeness by accounting for the vehicle's ability to actively avoid other road users as soon as they are detected in future observations. Our framework is agnostic to the driving environment and suitable for various motion planners. We demonstrate our algorithm on challenging urban and highway driving scenarios using the open-source CARLA simulator. △ Less

Submitted 27 June, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: To be appeared in Robotics: Science and Systems (RSS), 2021

arXiv:2102.07039 [pdf, other]

doi 10.1109/TAC.2021.3059838

FaSTrack: a Modular Framework for Real-Time Motion Planning and Guaranteed Safe Tracking

Authors: Mo Chen, Sylvia L. Herbert, Haimin Hu, Ye Pu, Jaime F. Fisac, Somil Bansal, SooJean Han, Claire J. Tomlin

Abstract: Real-time, guaranteed safe trajectory planning is vital for navigation in unknown environments. However, real-time navigation algorithms typically sacrifice robustness for computation speed. Alternatively, provably safe trajectory planning tends to be too computationally intensive for real-time replanning. We propose FaSTrack, Fast and Safe Tracking, a framework that achieves both real-time replan… ▽ More Real-time, guaranteed safe trajectory planning is vital for navigation in unknown environments. However, real-time navigation algorithms typically sacrifice robustness for computation speed. Alternatively, provably safe trajectory planning tends to be too computationally intensive for real-time replanning. We propose FaSTrack, Fast and Safe Tracking, a framework that achieves both real-time replanning and guaranteed safety. In this framework, real-time computation is achieved by allowing any trajectory planner to use a simplified \textit{planning model} of the system. The plan is tracked by the system, represented by a more realistic, higher-dimensional \textit{tracking model}. We precompute the tracking error bound (TEB) due to mismatch between the two models and due to external disturbances. We also obtain the corresponding tracking controller used to stay within the TEB. The precomputation does not require prior knowledge of the environment. We demonstrate FaSTrack using Hamilton-Jacobi reachability for precomputation and three different real-time trajectory planners with three different tracking-planning model pairs. △ Less

Submitted 13 March, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

Comments: Published in the IEEE Transactions on Automatic Control

arXiv:2002.00941 [pdf, other]

doi 10.1109/TRO.2020.2971415

Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections

Authors: Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan

Abstract: Human input has enabled autonomous systems to improve their capabilities and achieve complex behaviors that are otherwise challenging to generate automatically. Recent work focuses on how robots can use such input - like demonstrations or corrections - to learn intended objectives. These techniques assume that the human's desired objective already exists within the robot's hypothesis space. In rea… ▽ More Human input has enabled autonomous systems to improve their capabilities and achieve complex behaviors that are otherwise challenging to generate automatically. Recent work focuses on how robots can use such input - like demonstrations or corrections - to learn intended objectives. These techniques assume that the human's desired objective already exists within the robot's hypothesis space. In reality, this assumption is often inaccurate: there will always be situations where the person might care about aspects of the task that the robot does not know about. Without this knowledge, the robot cannot infer the correct objective. Hence, when the robot's hypothesis space is misspecified, even methods that keep track of uncertainty over the objective fail because they reason about which hypothesis might be correct, and not whether any of the hypotheses are correct. In this paper, we posit that the robot should reason explicitly about how well it can explain human inputs given its hypothesis space and use that situational confidence to inform how it should incorporate human input. We demonstrate our method on a 7 degree-of-freedom robot manipulator in learning from two important types of human input: demonstrations of manipulation tasks, and physical corrections during the robot's task execution. △ Less

Submitted 28 February, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

Comments: 20 pages. 12 figures, 1 table. IEEE Transactions on Robotics, 2020

arXiv:2001.04465 [pdf, other]

doi 10.1145/3319502.3374811

LESS is More: Rethinking Probabilistic Models of Human Behavior

Authors: Andreea Bobu, Dexter R. R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan

Abstract: Robots need models of human behavior for both inferring human goals and preferences, and predicting what people will do. A common model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward. While this model has been successful in a variety of robotics domains, its roots li… ▽ More Robots need models of human behavior for both inferring human goals and preferences, and predicting what people will do. A common model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward. While this model has been successful in a variety of robotics domains, its roots lie in econometrics, and in modeling decisions among different discrete options, each with its own utility or reward. In contrast, human trajectories lie in a continuous space, with continuous-valued features that influence the reward function. We propose that it is time to rethink the Boltzmann model, and design it from the ground up to operate over such trajectory spaces. We introduce a model that explicitly accounts for distances between trajectories, rather than only their rewards. Rather than each trajectory affecting the decision independently, similar trajectories now affect the decision together. We start by showing that our model better explains human behavior in a user study. We then analyze the implications this has for robot inference, first in toy environments where we have ground truth and find more accurate inference, and finally for a 7DOF robot arm learning from user demonstrations. △ Less

Submitted 13 January, 2020; originally announced January 2020.

Comments: 9 pages, 7 figures

arXiv:1811.07834 [pdf, other]

Safely Probabilistically Complete Real-Time Planning and Exploration in Unknown Environments

Authors: David Fridovich-Keil, Jaime F. Fisac, Claire J. Tomlin

Abstract: We present a new framework for motion planning that wraps around existing kinodynamic planners and guarantees recursive feasibility when operating in a priori unknown, static environments. Our approach makes strong guarantees about overall safety and collision avoidance by utilizing a robust controller derived from reachability analysis. We ensure that motion plans never exit the safe backward rea… ▽ More We present a new framework for motion planning that wraps around existing kinodynamic planners and guarantees recursive feasibility when operating in a priori unknown, static environments. Our approach makes strong guarantees about overall safety and collision avoidance by utilizing a robust controller derived from reachability analysis. We ensure that motion plans never exit the safe backward reachable set of the initial state, while safely exploring the space. This preserves the safety of the initial state, and guarantees that that we will eventually find the goal if it is possible to do so while exploring safely. We implement our framework in the Robot Operating System (ROS) software environment and demonstrate it in a real-time simulation. △ Less

Submitted 6 March, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

Comments: 7 pages, accepted to ICRA 2019

arXiv:1811.05929 [pdf, other]

A Scalable Framework For Real-Time Multi-Robot, Multi-Human Collision Avoidance

Authors: Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan, Claire J. Tomlin

Abstract: Robust motion planning is a well-studied problem in the robotics literature, yet current algorithms struggle to operate scalably and safely in the presence of other moving agents, such as humans. This paper introduces a novel framework for robot navigation that accounts for high-order system dynamics and maintains safety in the presence of external disturbances, other robots, and non-deterministic… ▽ More Robust motion planning is a well-studied problem in the robotics literature, yet current algorithms struggle to operate scalably and safely in the presence of other moving agents, such as humans. This paper introduces a novel framework for robot navigation that accounts for high-order system dynamics and maintains safety in the presence of external disturbances, other robots, and non-deterministic intentional agents. Our approach precomputes a tracking error margin for each robot, generates confidence-aware human motion predictions, and coordinates multiple robots with a sequential priority ordering, effectively enabling scalable safe trajectory planning and execution. We demonstrate our approach in hardware with two robots and two humans. We also showcase our work's scalability in a larger simulation. △ Less

Submitted 14 November, 2018; originally announced November 2018.

arXiv:1810.05766 [pdf, other]

Hierarchical Game-Theoretic Planning for Autonomous Vehicles

Authors: Jaime F. Fisac, Eli Bronstein, Elis Stefansson, Dorsa Sadigh, S. Shankar Sastry, Anca D. Dragan

Abstract: The actions of an autonomous vehicle on the road affect and are affected by those of other drivers, whether overtaking, negotiating a merge, or avoiding an accident. This mutual dependence, best captured by dynamic game theory, creates a strong coupling between the vehicle's planning and its predictions of other drivers' behavior, and constitutes an open problem with direct implications on the saf… ▽ More The actions of an autonomous vehicle on the road affect and are affected by those of other drivers, whether overtaking, negotiating a merge, or avoiding an accident. This mutual dependence, best captured by dynamic game theory, creates a strong coupling between the vehicle's planning and its predictions of other drivers' behavior, and constitutes an open problem with direct implications on the safety and viability of autonomous driving technology. Unfortunately, dynamic games are too computationally demanding to meet the real-time constraints of autonomous driving in its continuous state and action space. In this paper, we introduce a novel game-theoretic trajectory planning algorithm for autonomous driving, that enables real-time performance by hierarchically decomposing the underlying dynamic game into a long-horizon "strategic" game with simplified dynamics and full information structure, and a short-horizon "tactical" game with full dynamics and a simplified information structure. The value of the strategic game is used to guide the tactical planning, implicitly extending the planning horizon, pushing the local trajectory optimization closer to global solutions, and, most importantly, quantitatively accounting for the autonomous vehicle and the human driver's ability and incentives to influence each other. In addition, our approach admits non-deterministic models of human decision-making, rather than relying on perfectly rational predictions. Our results showcase richer, safer, and more effective autonomous behavior in comparison to existing techniques. △ Less

Submitted 12 October, 2018; originally announced October 2018.

Comments: Submitted to ICRA 2019

MSC Class: 68T40; 93C85; 91A25 ACM Class: I.2.9

arXiv:1810.05157 [pdf, other]

Learning under Misspecified Objective Spaces

Authors: Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Anca D. Dragan

Abstract: Learning robot objective functions from human input has become increasingly important, but state-of-the-art techniques assume that the human's desired objective lies within the robot's hypothesis space. When this is not true, even methods that keep track of uncertainty over the objective fail because they reason about which hypothesis might be correct, and not whether any of the hypotheses are cor… ▽ More Learning robot objective functions from human input has become increasingly important, but state-of-the-art techniques assume that the human's desired objective lies within the robot's hypothesis space. When this is not true, even methods that keep track of uncertainty over the objective fail because they reason about which hypothesis might be correct, and not whether any of the hypotheses are correct. We focus specifically on learning from physical human corrections during the robot's task execution, where not having a rich enough hypothesis space leads to the robot updating its objective in ways that the person did not actually intend. We observe that such corrections appear irrelevant to the robot, because they are not the best way of achieving any of the candidate objectives. Instead of naively trusting and learning from every human interaction, we propose robots learn conservatively by reasoning in real time about how relevant the human's correction is for the robot's hypothesis space. We test our inference method in an experiment with human interaction data, and demonstrate that this alleviates unintended learning in an in-person user study with a 7DoF robot manipulator. △ Less

Submitted 26 October, 2018; v1 submitted 11 October, 2018; originally announced October 2018.

Comments: Conference on Robot Learning (CoRL) 2018

arXiv:1809.00706 [pdf, other]

A Minimum Discounted Reward Hamilton-Jacobi Formulation for Computing Reachable Sets

Authors: Anayo K. Akametalu, Shromona Ghosh, Jaime F. Fisac, Claire J. Tomlin

Abstract: We propose a novel formulation for approximating reachable sets through a minimum discounted reward optimal control problem. The formulation yields a continuous solution that can be obtained by solving a Hamilton-Jacobi equation. Furthermore, the numerical approximation to this solution can be obtained as the unique fixed-point to a contraction mapping. This allows for more efficient solution meth… ▽ More We propose a novel formulation for approximating reachable sets through a minimum discounted reward optimal control problem. The formulation yields a continuous solution that can be obtained by solving a Hamilton-Jacobi equation. Furthermore, the numerical approximation to this solution can be obtained as the unique fixed-point to a contraction mapping. This allows for more efficient solution methods that could not be applied under traditional formulations for solving reachable sets. In addition, this formulation provides a link between reinforcement learning and learning reachable sets for systems with unknown dynamics, allowing algorithms from the former to be applied to the latter. We use two benchmark examples, double integrator, and pursuit-evasion games, to show the correctness of the formulation as well as its strengths in comparison to previous work. △ Less

Submitted 3 September, 2018; originally announced September 2018.

arXiv:1806.03820 [pdf, other]

An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

Authors: Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan

Abstract: Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can… ▽ More Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. In this work, we exploit a specific property of CIRL---the human is a full information agent---to derive an optimality-preserving modification to the standard Bellman update; this reduces the complexity of the problem by an exponential factor and allows us to relax CIRL's assumption of human rationality. We apply this update to a variety of POMDP solvers and find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human. In solutions to these larger problems, the human exhibits pedagogic (teaching) behavior, while the robot interprets it as such and attains higher value for the human. △ Less

Submitted 11 June, 2018; originally announced June 2018.

arXiv:1806.00109 [pdf, other]

Probabilistically Safe Robot Planning with Confidence-Based Human Predictions

Authors: Jaime F. Fisac, Andrea Bajcsy, Sylvia L. Herbert, David Fridovich-Keil, Steven Wang, Claire J. Tomlin, Anca D. Dragan

Abstract: In order to safely operate around humans, robots can employ predictive models of human motion. Unfortunately, these models cannot capture the full complexity of human behavior and necessarily introduce simplifying assumptions. As a result, predictions may degrade whenever the observed human behavior departs from the assumed structure, which can have negative implications for safety. In this paper,… ▽ More In order to safely operate around humans, robots can employ predictive models of human motion. Unfortunately, these models cannot capture the full complexity of human behavior and necessarily introduce simplifying assumptions. As a result, predictions may degrade whenever the observed human behavior departs from the assumed structure, which can have negative implications for safety. In this paper, we observe that how "rational" human actions appear under a particular model can be viewed as an indicator of that model's ability to describe the human's current motion. By reasoning about this model confidence in a real-time Bayesian framework, we show that the robot can very quickly modulate its predictions to become more uncertain when the model performs poorly. Building on recent work in provably-safe trajectory planning, we leverage these confidence-aware human motion predictions to generate assured autonomous robot motion. Our new analysis combines worst-case tracking error guarantees for the physical robot with probabilistic time-varying human predictions, yielding a quantitative, probabilistic safety certificate. We demonstrate our approach with a quadcopter navigating around a human. △ Less

Submitted 31 May, 2018; originally announced June 2018.

Comments: Robotics Science and Systems (RSS) 2018

arXiv:1802.05250 [pdf, other]

Generating Plans that Predict Themselves

Authors: Jaime F. Fisac, Chang Liu, Jessica B. Hamrick, S. Shankar Sastry, J. Karl Hedrick, Thomas L. Griffiths, Anca D. Dragan

Abstract: Collaboration requires coordination, and we coordinate by anticipating our teammates' future actions and adapting to their plan. In some cases, our teammates' actions early on can give us a clear idea of what the remainder of their plan is, i.e. what action sequence we should expect. In others, they might leave us less confident, or even lead us to the wrong conclusion. Our goal is for robot actio… ▽ More Collaboration requires coordination, and we coordinate by anticipating our teammates' future actions and adapting to their plan. In some cases, our teammates' actions early on can give us a clear idea of what the remainder of their plan is, i.e. what action sequence we should expect. In others, they might leave us less confident, or even lead us to the wrong conclusion. Our goal is for robot actions to fall in the first category: we want to enable robots to select their actions in such a way that human collaborators can easily use them to correctly anticipate what will follow. While previous work has focused on finding initial plans that convey a set goal, here we focus on finding two portions of a plan such that the initial portion conveys the final one. We introduce $t$-\ACty{}: a measure that quantifies the accuracy and confidence with which human observers can predict the remaining robot plan from the overall task goal and the observed initial $t$ actions in the plan. We contribute a method for generating $t$-predictable plans: we search for a full plan that accomplishes the task, but in which the first $t$ actions make it as easy as possible to infer the remaining ones. The result is often different from the most efficient plan, in which the initial actions might leave a lot of ambiguity as to how the task will be completed. Through an online experiment and an in-person user study with physical robots, we find that our approach outperforms a traditional efficiency-based planner in objective and subjective collaboration metrics. △ Less

Submitted 14 February, 2018; originally announced February 2018.

Comments: Published at the Workshop on Algorithmic Foundations of Robotics (WAFR 2016)

MSC Class: 68T05 ACM Class: I.2.8; I.2.9

Journal ref: Jaime F. Fisac, Chang Liu, Jessica B. Hamrick, S. Shankar Sastry, J. Karl Hedrick, Thomas L. Griffiths, and Anca D. Dragan. "Generating Plans that Predict Themselves". Workshop on Algorithmic Foundations of Robotics (WAFR), 2016

arXiv:1802.01780 [pdf, other]

Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration

Authors: Chang Liu, Jessica B. Hamrick, Jaime F. Fisac, Anca D. Dragan, J. Karl Hedrick, S. Shankar Sastry, Thomas L. Griffiths

Abstract: The study of human-robot interaction is fundamental to the design and use of robotics in real-world applications. Robots will need to predict and adapt to the actions of human collaborators in order to achieve good performance and improve safety and end-user adoption. This paper evaluates a human-robot collaboration scheme that combines the task allocation and motion levels of reasoning: the robot… ▽ More The study of human-robot interaction is fundamental to the design and use of robotics in real-world applications. Robots will need to predict and adapt to the actions of human collaborators in order to achieve good performance and improve safety and end-user adoption. This paper evaluates a human-robot collaboration scheme that combines the task allocation and motion levels of reasoning: the robotic agent uses Bayesian inference to predict the next goal of its human partner from his or her ongoing motion, and re-plans its own actions in real time. This anticipative adaptation is desirable in many practical scenarios, where humans are unable or unwilling to take on the cognitive overhead required to explicitly communicate their intent to the robot. A behavioral experiment indicates that the combination of goal inference and dynamic task planning significantly improves both objective and perceived performance of the human-robot team. Participants were highly sensitive to the differences between robot behaviors, preferring to work with a robot that adapted to their actions over one that did not. △ Less

Submitted 5 February, 2018; originally announced February 2018.

Comments: Published at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016)

MSC Class: 68T05 ACM Class: I.2.0; I.2.6; I.2.8; I.2.9

Journal ref: C. Liu, J. Hamrick, J. Fisac, A. Dragan, J. K. Hedrick, S. Sastry, T. Griffiths. "Goal Inference Improves Objective and Perceived Performance in Human-Robot Collaboration". Autonomous Agents and Multiagent Systems (AAMAS), 2016

arXiv:1710.04731 [pdf, other]

Planning, Fast and Slow: A Framework for Adaptive Real-Time Safe Trajectory Planning

Authors: David Fridovich-Keil, Sylvia L. Herbert, Jaime F. Fisac, Sampada Deglurkar, Claire J. Tomlin

Abstract: Motion planning is an extremely well-studied problem in the robotics community, yet existing work largely falls into one of two categories: computationally efficient but with few if any safety guarantees, or able to give stronger guarantees but at high computational cost. This work builds on a recent development called FaSTrack in which a slow offline computation provides a modular safety guarante… ▽ More Motion planning is an extremely well-studied problem in the robotics community, yet existing work largely falls into one of two categories: computationally efficient but with few if any safety guarantees, or able to give stronger guarantees but at high computational cost. This work builds on a recent development called FaSTrack in which a slow offline computation provides a modular safety guarantee for a faster online planner. We introduce the notion of "meta-planning" in which a refined offline computation enables safe switching between different online planners. This provides autonomous systems with the ability to adapt motion plans to a priori unknown environments in real-time as sensor measurements detect new obstacles, and the flexibility to maneuver differently in the presence of obstacles than they would in free space, all while maintaining a strict safety guarantee. We demonstrate the meta-planning algorithm both in simulation and in hardware using a small Crazyflie 2.0 quadrotor. △ Less

Submitted 6 March, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

Comments: ICRA, International Conference on Robotics and Automation, ICRA 2018, 8 pages, 9 figures

arXiv:1707.06354 [pdf, other]

Pragmatic-Pedagogic Value Alignment

Authors: Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan

Abstract: As intelligent systems gain autonomy and capability, it becomes vital to ensure that their objectives match those of their human users; this is known as the value-alignment problem. In robotics, value alignment is key to the design of collaborative robots that can integrate into human workflows, successfully inferring and adapting to their users' objectives as they go. We argue that a meaningful s… ▽ More As intelligent systems gain autonomy and capability, it becomes vital to ensure that their objectives match those of their human users; this is known as the value-alignment problem. In robotics, value alignment is key to the design of collaborative robots that can integrate into human workflows, successfully inferring and adapting to their users' objectives as they go. We argue that a meaningful solution to value alignment must combine multi-agent decision theory with rich mathematical models of human cognition, enabling robots to tap into people's natural collaborative capabilities. We present a solution to the cooperative inverse reinforcement learning (CIRL) dynamic game based on well-established cognitive models of decision making and theory of mind. The solution captures a key reciprocity relation: the human will not plan her actions in isolation, but rather reason pedagogically about how the robot might learn from them; the robot, in turn, can anticipate this and interpret the human's actions pragmatically. To our knowledge, this work constitutes the first formal analysis of value alignment grounded in empirically validated cognitive models. △ Less

Submitted 5 February, 2018; v1 submitted 19 July, 2017; originally announced July 2017.

Comments: Published at the International Symposium on Robotics Research (ISRR 2017)

MSC Class: 68T05 ACM Class: I.2.0; I.2.6; I.2.8; I.2.9

Journal ref: International Symposium on Robotics Research, 2017

arXiv:1705.01292 [pdf, other]

A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems

Authors: Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, Claire J. Tomlin

Abstract: The proven efficacy of learning-based control schemes strongly motivates their application to robotic systems operating in the physical world. However, guaranteeing correct operation during the learning process is currently an unresolved issue, which is of vital importance in safety-critical systems. We propose a general safety framework based on Hamilton-Jacobi reachability methods that can work… ▽ More The proven efficacy of learning-based control schemes strongly motivates their application to robotic systems operating in the physical world. However, guaranteeing correct operation during the learning process is currently an unresolved issue, which is of vital importance in safety-critical systems. We propose a general safety framework based on Hamilton-Jacobi reachability methods that can work in conjunction with an arbitrary learning algorithm. The method exploits approximate knowledge of the system dynamics to guarantee constraint satisfaction while minimally interfering with the learning process. We further introduce a Bayesian mechanism that refines the safety analysis as the system acquires new evidence, reducing initial conservativeness when appropriate while strengthening guarantees through real-time validation. The result is a least-restrictive, safety-preserving control law that intervenes only when (a) the computed safety guarantees require it, or (b) confidence in the computed guarantees decays in light of new observations. We prove theoretical safety guarantees combining probabilistic and worst-case analysis and demonstrate the proposed framework experimentally on a quadrotor vehicle. Even though safety analysis is based on a simple point-mass model, the quadrotor successfully arrives at a suitable controller by policy-gradient reinforcement learning without ever crashing, and safely retracts away from a strong external disturbance introduced during flight. △ Less

Submitted 14 February, 2018; v1 submitted 3 May, 2017; originally announced May 2017.

Comments: Accepted for publication in IEEE Transactions on Automatic Control. Video with experiments: https://youtu.be/WAAxyeSk2bw

ACM Class: I.2.9; I.2.8; I.2.6

arXiv:1703.07373 [pdf, other]

doi 10.1109/CDC.2017.8263867

FaSTrack: a Modular Framework for Fast and Guaranteed Safe Motion Planning

Authors: Sylvia L. Herbert, Mo Chen, SooJean Han, Somil Bansal, Jaime F. Fisac, Claire J. Tomlin

Abstract: Fast and safe navigation of dynamical systems through a priori unknown cluttered environments is vital to many applications of autonomous systems. However, trajectory planning for autonomous systems is computationally intensive, often requiring simplified dynamics that sacrifice safety and dynamic feasibility in order to plan efficiently. Conversely, safe trajectories can be computed using more so… ▽ More Fast and safe navigation of dynamical systems through a priori unknown cluttered environments is vital to many applications of autonomous systems. However, trajectory planning for autonomous systems is computationally intensive, often requiring simplified dynamics that sacrifice safety and dynamic feasibility in order to plan efficiently. Conversely, safe trajectories can be computed using more sophisticated dynamic models, but this is typically too slow to be used for real-time planning. We propose a new algorithm FaSTrack: Fast and Safe Tracking for High Dimensional systems. A path or trajectory planner using simplified dynamics to plan quickly can be incorporated into the FaSTrack framework, which provides a safety controller for the vehicle along with a guaranteed tracking error bound. This bound captures all possible deviations due to high dimensional dynamics and external disturbances. Note that FaSTrack is modular and can be used with most current path or trajectory planners. We demonstrate this framework using a 10D nonlinear quadrotor model tracking a 3D path obtained from an RRT planner. △ Less

Submitted 13 February, 2021; v1 submitted 21 March, 2017; originally announced March 2017.

Comments: Published in the Proceedings of the IEEE Conference on Decision and Control, 2017

arXiv:1611.08364 [pdf, other]

Robust Sequential Path Planning Under Disturbances and Adversarial Intruder

Authors: Mo Chen, Somil Bansal, Jaime F. Fisac, Claire J. Tomlin

Abstract: Provably safe and scalable multi-vehicle path planning is an important and urgent problem due to the expected increase of automation in civilian airspace in the near future. Although this problem has been studied in the past, there has not been a method that guarantees both goal satisfaction and safety for vehicles with general nonlinear dynamics while taking into account disturbances and potentia… ▽ More Provably safe and scalable multi-vehicle path planning is an important and urgent problem due to the expected increase of automation in civilian airspace in the near future. Although this problem has been studied in the past, there has not been a method that guarantees both goal satisfaction and safety for vehicles with general nonlinear dynamics while taking into account disturbances and potential adversarial agents, to the best of our knowledge. Hamilton-Jacobi (HJ) reachability is the ideal tool for guaranteeing goal satisfaction and safety under such scenarios, and has been successfully applied to many small-scale problems. However, a direct application of HJ reachability in most cases becomes intractable when there are more than two vehicles due to the exponentially scaling computational complexity with respect to system dimension. In this paper, we take advantage of the guarantees HJ reachability provides, and eliminate the computation burden by assigning a strict priority ordering to the vehicles under consideration. Under this sequential path planning (SPP) scheme, vehicles reserve "space-time" portions in the airspace, and the space-time portions guarantee dynamic feasibility, collision avoidance, and optimality of the paths given the priority ordering. With a computation complexity that scales quadratically when accounting for both disturbances and an intruder, and linearly when accounting for only disturbances, SPP can tractably solve the multi-vehicle path planning problem for vehicles with general nonlinear dynamics in a practical setting. We demonstrate our theory in representative simulations. △ Less

Submitted 25 November, 2016; originally announced November 2016.

Comments: Submitted to IEEE Transactions on Control Systems Technology

arXiv:1603.05208 [pdf, other]

Safe Sequential Path Planning Under Disturbances and Imperfect Information

Authors: Somil Bansal, Mo Chen, Jaime F. Fisac, Claire J. Tomlin

Abstract: Multi-UAV systems are safety-critical, and guarantees must be made to ensure no unsafe configurations occur. Hamilton-Jacobi (HJ) reachability is ideal for analyzing such safety-critical systems; however, its direct application is limited to small-scale systems of no more than two vehicles due to an exponentially-scaling computational complexity. Previously, the sequential path planning (SPP) meth… ▽ More Multi-UAV systems are safety-critical, and guarantees must be made to ensure no unsafe configurations occur. Hamilton-Jacobi (HJ) reachability is ideal for analyzing such safety-critical systems; however, its direct application is limited to small-scale systems of no more than two vehicles due to an exponentially-scaling computational complexity. Previously, the sequential path planning (SPP) method, which assigns strict priorities to vehicles, was proposed; SPP allows multi-vehicle path planning to be done with a linearly-scaling computational complexity. However, the previous formulation assumed that there are no disturbances, and that every vehicle has perfect knowledge of higher-priority vehicles' positions. In this paper, we make SPP more practical by providing three different methods to account for disturbances in dynamics and imperfect knowledge of higher-priority vehicles' states. Each method has different assumptions about information sharing. We demonstrate our proposed methods in simulations. △ Less

Submitted 8 June, 2017; v1 submitted 16 March, 2016; originally announced March 2016.

Comments: American Control Conference, 2017

arXiv:1412.7223 [pdf, other]

Safe Sequential Path Planning of Multi-Vehicle Systems via Double-Obstacle Hamilton-Jacobi-Isaacs Variational Inequality

Authors: Mo Chen, Jaime F. Fisac, Shankar Sastry, Claire J. Tomlin

Abstract: We consider the problem of planning trajectories for a group of $N$ vehicles, each aiming to reach its own target set while avoiding danger zones of other vehicles. The analysis of problems like this is extremely important practically, especially given the growing interest in utilizing unmanned aircraft systems for civil purposes. The direct solution of this problem by solving a single-obstacle Ha… ▽ More We consider the problem of planning trajectories for a group of $N$ vehicles, each aiming to reach its own target set while avoiding danger zones of other vehicles. The analysis of problems like this is extremely important practically, especially given the growing interest in utilizing unmanned aircraft systems for civil purposes. The direct solution of this problem by solving a single-obstacle Hamilton-Jacobi-Isaacs (HJI) variational inequality (VI) is numerically intractable due to the exponential scaling of computation complexity with problem dimensionality. Furthermore, the single-obstacle HJI VI cannot directly handle situations in which vehicles do not have a common scheduled arrival time. Instead, we perform sequential path planning by considering vehicles in order of priority, modeling higher-priority vehicles as time-varying obstacles for lower-priority vehicles. To do this, we solve a double-obstacle HJI VI which allows us to obtain the reach-avoid set, defined as the set of states from which a vehicle can reach its target while staying within a time-varying state constraint set. From the solution of the double-obstacle HJI VI, we can also extract the latest start time and the optimal control for each vehicle. This is a first application of the double-obstacle HJI VI which can handle systems with time-varying dynamics, target sets, and state constraint sets, and results in computation complexity that scales linearly, as opposed to exponentially, with the number of vehicles in consideration. △ Less

Submitted 20 March, 2016; v1 submitted 22 December, 2014; originally announced December 2014.

Comments: European Control Conference 2015

Showing 1–40 of 40 results for author: Fisac, J F