subscribe to arXiv mailings

A Survey of Scam Exposure, Victimization, Types, Vectors, and Reporting in 12 Countries

Authors: Mo Houtti, Abhishek Roy, Venkata Narsi Reddy Gangula, Ashley Marie Walker

Abstract: Scams are a widespread issue with severe consequences for both victims and perpetrators, but existing data collection is fragmented, precluding global and comparative local understanding. The present study addresses this gap through a nationally representative survey (n = 8,369) on scam exposure, victimization, types, vectors, and reporting in 12 countries: Belgium, Egypt, France, Hungary, Indones… ▽ More Scams are a widespread issue with severe consequences for both victims and perpetrators, but existing data collection is fragmented, precluding global and comparative local understanding. The present study addresses this gap through a nationally representative survey (n = 8,369) on scam exposure, victimization, types, vectors, and reporting in 12 countries: Belgium, Egypt, France, Hungary, Indonesia, Mexico, Romania, Slovakia, South Africa, South Korea, Sweden, and the United Kingdom. We analyze 6 survey questions to build a detailed quantitative picture of the scams landscape in each country, and compare across countries to identify global patterns. We find, first, that residents of less affluent countries suffer financial loss from scams more often. Second, we find that the internet plays a key role in scams across the globe, and that GNI per-capita is strongly associated with specific scam types and contact vectors. Third, we find widespread under-reporting, with residents of less affluent countries being less likely to know how to report a scam. Our findings contribute valuable insights for researchers, practitioners, and policymakers in the online fraud and scam prevention space. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: To appear in the Journal of Online Trust and Safety

arXiv:2406.06474 [pdf, other]

Towards a Personal Health Large Language Model

Authors: Justin Cosentino, Anastasiya Belyaeva, Xin Liu, Nicholas A. Furlotte, Zhun Yang, Chace Lee, Erik Schenck, Yojan Patel, Jian Cui, Logan Douglas Schneider, Robby Bryant, Ryan G. Gomes, Allen Jiang, Roy Lee, Yun Liu, Javier Perez, Jameson K. Rogers, Cathy Speed, Shyam Tailor, Megan Walker, Jeffrey Yu, Tim Althoff, Conor Heneghan, John Hernandez, Mark Malhotra , et al. (9 additional authors not shown)

Abstract: In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We… ▽ More In health, most large language model (LLM) research has focused on clinical tasks. However, mobile and wearable devices, which are rarely integrated into such tasks, provide rich, longitudinal data for personal health monitoring. Here we present Personal Health Large Language Model (PH-LLM), fine-tuned from Gemini for understanding and reasoning over numerical time-series personal health data. We created and curated three datasets that test 1) production of personalized insights and recommendations from sleep patterns, physical activity, and physiological responses, 2) expert domain knowledge, and 3) prediction of self-reported sleep outcomes. For the first task we designed 857 case studies in collaboration with domain experts to assess real-world scenarios in sleep and fitness. Through comprehensive evaluation of domain-specific rubrics, we observed that Gemini Ultra 1.0 and PH-LLM are not statistically different from expert performance in fitness and, while experts remain superior for sleep, fine-tuning PH-LLM provided significant improvements in using relevant domain knowledge and personalizing information for sleep insights. We evaluated PH-LLM domain knowledge using multiple choice sleep medicine and fitness examinations. PH-LLM achieved 79% on sleep and 88% on fitness, exceeding average scores from a sample of human experts. Finally, we trained PH-LLM to predict self-reported sleep quality outcomes from textual and multimodal encoding representations of wearable data, and demonstrate that multimodal encoding is required to match performance of specialized discriminative models. Although further development and evaluation are necessary in the safety-critical personal health domain, these results demonstrate both the broad knowledge and capabilities of Gemini models and the benefit of contextualizing physiological data for personal health applications as done with PH-LLM. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 72 pages

arXiv:2312.11772 [pdf, other]

CAManim: Animating end-to-end network activation maps

Authors: Emily Kaczmarek, Olivier X. Miguel, Alexa C. Bowie, Robin Ducharme, Alysha L. J. Dingwall-Harvey, Steven Hawken, Christine M. Armour, Mark C. Walker, Kevin Dick

Abstract: Deep neural networks have been widely adopted in numerous domains due to their high performance and accessibility to developers and application-specific end-users. Fundamental to image-based applications is the development of Convolutional Neural Networks (CNNs), which possess the ability to automatically extract features from data. However, comprehending these complex models and their learned rep… ▽ More Deep neural networks have been widely adopted in numerous domains due to their high performance and accessibility to developers and application-specific end-users. Fundamental to image-based applications is the development of Convolutional Neural Networks (CNNs), which possess the ability to automatically extract features from data. However, comprehending these complex models and their learned representations, which typically comprise millions of parameters and numerous layers, remains a challenge for both developers and end-users. This challenge arises due to the absence of interpretable and transparent tools to make sense of black-box models. There exists a growing body of Explainable Artificial Intelligence (XAI) literature, including a collection of methods denoted Class Activation Maps (CAMs), that seek to demystify what representations the model learns from the data, how it informs a given prediction, and why it, at times, performs poorly in certain tasks. We propose a novel XAI visualization method denoted CAManim that seeks to simultaneously broaden and focus end-user understanding of CNN predictions by animating the CAM-based network activation maps through all layers, effectively depicting from end-to-end how a model progressively arrives at the final layer activation. Herein, we demonstrate that CAManim works with any CAM-based method and various CNN architectures. Beyond qualitative model assessments, we additionally propose a novel quantitative assessment that expands upon the Remove and Debias (ROAD) metric, pairing the qualitative end-to-end network visual explanations assessment with our novel quantitative "yellow brick ROAD" assessment (ybROAD). This builds upon prior research to address the increasing demand for interpretable, robust, and transparent model assessment methodology, ultimately improving an end-user's trust in a given model's predictions. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.07286 [pdf, other]

Explaining black boxes with a SMILE: Statistical Model-agnostic Interpretability with Local Explanations

Authors: Koorosh Aslansefat, Mojgan Hashemian, Martin Walker, Mohammed Naveed Akram, Ioannis Sorokos, Yiannis Papadopoulos

Abstract: Machine learning is currently undergoing an explosion in capability, popularity, and sophistication. However, one of the major barriers to widespread acceptance of machine learning (ML) is trustworthiness: most ML models operate as black boxes, their inner workings opaque and mysterious, and it can be difficult to trust their conclusions without understanding how those conclusions are reached. Exp… ▽ More Machine learning is currently undergoing an explosion in capability, popularity, and sophistication. However, one of the major barriers to widespread acceptance of machine learning (ML) is trustworthiness: most ML models operate as black boxes, their inner workings opaque and mysterious, and it can be difficult to trust their conclusions without understanding how those conclusions are reached. Explainability is therefore a key aspect of improving trustworthiness: the ability to better understand, interpret, and anticipate the behaviour of ML models. To this end, we propose SMILE, a new method that builds on previous approaches by making use of statistical distance measures to improve explainability while remaining applicable to a wide range of input data domains. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2308.01887 [pdf, other]

Athena 2.0: Discourse and User Modeling in Open Domain Dialogue

Authors: Omkar Patil, Lena Reed, Kevin K. Bowden, Juraj Juraska, Wen Cui, Vrindavan Harrison, Rishi Rajasekaran, Angela Ramirez, Cecilia Li, Eduardo Zamora, Phillip Lee, Jeshwanth Bheemanpally, Rohan Pandey, Adwait Ratnaparkhi, Marilyn Walker

Abstract: Conversational agents are consistently growing in popularity and many people interact with them every day. While many conversational agents act as personal assistants, they can have many different goals. Some are task-oriented, such as providing customer support for a bank or making a reservation. Others are designed to be empathetic and to form emotional connections with the user. The Alexa Prize… ▽ More Conversational agents are consistently growing in popularity and many people interact with them every day. While many conversational agents act as personal assistants, they can have many different goals. Some are task-oriented, such as providing customer support for a bank or making a reservation. Others are designed to be empathetic and to form emotional connections with the user. The Alexa Prize Challenge aims to create a socialbot, which allows the user to engage in coherent conversations, on a range of popular topics that will interest the user. Here we describe Athena 2.0, UCSC's conversational agent for Amazon's Socialbot Grand Challenge 4. Athena 2.0 utilizes a novel knowledge-grounded discourse model that tracks the entity links that Athena introduces into the dialogue, and uses them to constrain named-entity recognition and linking, and coreference resolution. Athena 2.0 also relies on a user model to personalize topic selection and other aspects of the conversation to individual users. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: Alexa Prize Proceedings, 2021. Socialbot Grand Challenge 4

arXiv:2307.16863 [pdf, other]

MetaCAM: Ensemble-Based Class Activation Map

Authors: Emily Kaczmarek, Olivier X. Miguel, Alexa C. Bowie, Robin Ducharme, Alysha L. J. Dingwall-Harvey, Steven Hawken, Christine M. Armour, Mark C. Walker, Kevin Dick

Abstract: The need for clear, trustworthy explanations of deep learning model predictions is essential for high-criticality fields, such as medicine and biometric identification. Class Activation Maps (CAMs) are an increasingly popular category of visual explanation methods for Convolutional Neural Networks (CNNs). However, the performance of individual CAMs depends largely on experimental parameters such a… ▽ More The need for clear, trustworthy explanations of deep learning model predictions is essential for high-criticality fields, such as medicine and biometric identification. Class Activation Maps (CAMs) are an increasingly popular category of visual explanation methods for Convolutional Neural Networks (CNNs). However, the performance of individual CAMs depends largely on experimental parameters such as the selected image, target class, and model. Here, we propose MetaCAM, an ensemble-based method for combining multiple existing CAM methods based on the consensus of the top-k% most highly activated pixels across component CAMs. We perform experiments to quantifiably determine the optimal combination of 11 CAMs for a given MetaCAM experiment. A new method denoted Cumulative Residual Effect (CRE) is proposed to summarize large-scale ensemble-based experiments. We also present adaptive thresholding and demonstrate how it can be applied to individual CAMs to improve their performance, measured using pixel perturbation method Remove and Debias (ROAD). Lastly, we show that MetaCAM outperforms existing CAMs and refines the most salient regions of images used for model predictions. In a specific example, MetaCAM improved ROAD performance to 0.393 compared to 11 individual CAMs with ranges from -0.101-0.172, demonstrating the importance of combining CAMs through an ensembling method and adaptive thresholding. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 9 pages

arXiv:2307.14440 [pdf, other]

Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking

Authors: Angela Ramirez, Karik Agarwal, Juraj Juraska, Utkarsh Garg, Marilyn A. Walker

Abstract: Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for contr… ▽ More Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for controllable NLG using prompt-based learning. Here we develop a novel few-shot overgenerate-and-rank approach that achieves the controlled generation of DAs. We compare eight few-shot prompt styles that include a novel method of generating from textual pseudo-references using a textual style transfer approach. We develop six automatic ranking functions that identify outputs with both the correct DA and high semantic accuracy at generation time. We test our approach on three domains and four LLMs. To our knowledge, this is the first work on NLG for dialogue that automatically ranks outputs using both DA and attribute accuracy. For completeness, we compare our results to fine-tuned few-shot models trained with 5 to 100 instances per DA. Our results show that several prompt settings achieve perfect DA accuracy, and near perfect semantic accuracy (99.81%) and perform better than few-shot fine-tuning. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: To Appear in SIGDIAL 2023. Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2023

arXiv:2306.10051 [pdf, other]

TOBY: A Tool for Exploring Data in Academic Survey Papers

Authors: Tathagata Chakraborti, Jungkoo Kang, Christian Muise, Sarath Sreedharan, Michael Walker, Daniel Szafir, Tom Williams

Abstract: This paper describes TOBY, a visualization tool that helps a user explore the contents of an academic survey paper. The visualization consists of four components: a hierarchical view of taxonomic data in the survey, a document similarity view in the space of taxonomic classes, a network view of citations, and a new paper recommendation tool. In this paper, we will discuss these features in the con… ▽ More This paper describes TOBY, a visualization tool that helps a user explore the contents of an academic survey paper. The visualization consists of four components: a hierarchical view of taxonomic data in the survey, a document similarity view in the space of taxonomic classes, a network view of citations, and a new paper recommendation tool. In this paper, we will discuss these features in the context of three separate deployments of the tool. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2303.04953 [pdf, other]

Let's Get Personal: Personal Questions Improve SocialBot Performance in the Alexa Prize

Authors: Kevin K. Bowden, Marilyn Walker

Abstract: There has been an increased focus on creating conversational open-domain dialogue systems in the spoken dialogue community. Unlike traditional dialogue systems, these conversational systems cannot assume any specific information need or domain restrictions, i.e., the only inherent goal is to converse with the user on an unknown set of topics. While massive improvements in Natural Language Understa… ▽ More There has been an increased focus on creating conversational open-domain dialogue systems in the spoken dialogue community. Unlike traditional dialogue systems, these conversational systems cannot assume any specific information need or domain restrictions, i.e., the only inherent goal is to converse with the user on an unknown set of topics. While massive improvements in Natural Language Understanding (NLU) and the growth of available knowledge resources can partially support a robust conversation, these conversations generally lack the rapport between two humans that know each other. We developed a robust open-domain conversational system, Athena, that real Amazon Echo users access and evaluate at scale in the context of the Alexa Prize competition. We experiment with methods intended to increase intimacy between Athena and the user by heuristically developing a rule-based user model that personalizes both the current and subsequent conversations and evaluating specific personal opinion question strategies in A/B studies. Our results show a statistically significant positive impact on perceived conversation quality and length when employing these strategies. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Won Best Paper at IWSDS '23

arXiv:2302.12944 [pdf, other]

Dependency Dialogue Acts -- Annotation Scheme and Case Study

Authors: Jon Z. Cai, Brendan King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ananya Ganesh, James H. Martin, Martha Palmer, Marilyn Walker, Jeffrey Flanigan

Abstract: In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse,… ▽ More In this paper, we introduce Dependency Dialogue Acts (DDA), a novel framework for capturing the structure of speaker-intentions in multi-party dialogues. DDA combines and adapts features from existing dialogue annotation frameworks, and emphasizes the multi-relational response structure of dialogues in addition to the dialogue acts and rhetorical relations. It represents the functional, discourse, and response structure in multi-party multi-threaded conversations. A few key features distinguish DDA from existing dialogue annotation frameworks such as SWBD-DAMSL and the ISO 24617-2 standard. First, DDA prioritizes the relational structure of the dialogue units and the dialog context, annotating both dialog acts and rhetorical relations as response relations to particular utterances. Second, DDA embraces overloading in dialogues, encouraging annotators to specify multiple response relations and dialog acts for each dialog unit. Lastly, DDA places an emphasis on adequately capturing how a speaker is using the full dialog context to plan and organize their speech. With these features, DDA is highly expressive and recall-oriented with regard to conversation dynamics between multiple speakers. In what follows, we present the DDA annotation framework and case studies annotating DDA structures in multi-party, multi-threaded conversations. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: The 13th International Workshop on Spoken Dialogue Systems Technology

Journal ref: The 13th International Workshop on Spoken Dialogue Systems Technology 2023

arXiv:2302.04424 [pdf, other]

A Transformer-based Response Evaluator for Open-Domain Spoken Conversation

Authors: Vrindavan Harrison, Rishi Rajasekaran, Marilyn Walker

Abstract: Many open-domain dialogue systems rely on multiple response generators, any of which can contribute a response to the dialogue in a particular context. Thus the ability to compare potential responses and then select the best plays an important role in ensuring a dialogue system is coherent and engaging. Dialogue coherence goes beyond simply remaining on topic -- some trivia may be on topic and eng… ▽ More Many open-domain dialogue systems rely on multiple response generators, any of which can contribute a response to the dialogue in a particular context. Thus the ability to compare potential responses and then select the best plays an important role in ensuring a dialogue system is coherent and engaging. Dialogue coherence goes beyond simply remaining on topic -- some trivia may be on topic and engaging when mentioned out of the blue, but may not be coherent and grounded in the context of the conversation. We carry out experiments on response selection in the Athena system, an Alexa Prize SocialBot that has dedicated content and multiple topic-specific response generators for a large number of topics. First, we collect a corpus of Athena conversations with live human traffic, where potential responses from all enabled response generators are logged and subsequently annotated for response quality. We compare several off-the-shelf response ranking methods for open-domain dialogue to Athena-Heuristic, a heuristic response ranker that was field-tested in Athena during the third Alexa Prize competition. We also compare these to a transformer-based response ranker we call Athena-RR, that we train on our Athena conversations. Athena-RR uses both the conversational context and the dialogue state to rank the potential responses. We find that Athena-RR with a Recall@1 of 70.79\% outperforms Athena-Heuristic and all of the off-the-shelf rankers by a large margin. We then conduct a live A/B study comparing Athena-Heuristic to Athena-RR in a 6,358 conversations with Alexa users. We show that Athena-RR leads to significantly longer conversations that receive significantly higher user ratings than the heuristic rule-based ranker. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: To Appear in International Workshop on Spoken Dialogue Technology, 2023

arXiv:2302.03848 [pdf, other]

Controlling Personality Style in Dialogue with Zero-Shot Prompt-Based Learning

Authors: Angela Ramirez, Mamon Alsalihy, Kartik Aggarwal, Cecilia Li, Liren Wu, Marilyn Walker

Abstract: Prompt-based or in-context learning has achieved high zero-shot performance on many natural language generation (NLG) tasks. Here we explore the performance of prompt-based learning for simultaneously controlling the personality and the semantic accuracy of an NLG for task-oriented dialogue. We experiment with prompt-based learning on the PERSONAGE restaurant recommendation corpus to generate sema… ▽ More Prompt-based or in-context learning has achieved high zero-shot performance on many natural language generation (NLG) tasks. Here we explore the performance of prompt-based learning for simultaneously controlling the personality and the semantic accuracy of an NLG for task-oriented dialogue. We experiment with prompt-based learning on the PERSONAGE restaurant recommendation corpus to generate semantically and stylistically-controlled text for 5 different Big-5 personality types: agreeable, disagreeable, conscientious, unconscientious, and extravert. We test two different classes of discrete prompts to generate utterances for a particular personality style: (1) prompts that demonstrate generating directly from a meaning representation that includes a personality specification; and (2) prompts that rely on first converting the meaning representation to a textual pseudo-reference, and then using the pseudo-reference in a textual style transfer (TST) prompt. In each case, we show that we can vastly improve performance by over-generating outputs and ranking them, testing several ranking functions based on automatic metrics for semantic accuracy, personality-match, and fluency. We also test whether NLG personality demonstrations from the restaurant domain can be used with meaning representations for the video game domain to generate personality stylized utterances about video games. Our findings show that the TST prompts produces the highest semantic accuracy (78.46% for restaurants and 87.6% for video games) and personality accuracy (100% for restaurants and 97% for video games). Our results on transferring personality style to video game utterances are surprisingly good. To our knowledge, there is no previous work testing the application of prompt-based learning to simultaneously controlling both style and semantic accuracy in NLG. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: To appear at International Workshop on Spoken Dialogue Systems Technology, 2023

arXiv:2301.13372 [pdf, other]

Improving Open-Domain Dialogue Evaluation with a Causal Inference Model

Authors: Cat P. Le, Luke Dai, Michael Johnston, Yang Liu, Marilyn Walker, Reza Ghanadan

Abstract: Effective evaluation methods remain a significant challenge for research on open-domain conversational dialogue systems. Explicit satisfaction ratings can be elicited from users, but users often do not provide ratings when asked, and those they give can be highly subjective. Post-hoc ratings by experts are an alternative, but these can be both expensive and complex to collect. Here, we explore the… ▽ More Effective evaluation methods remain a significant challenge for research on open-domain conversational dialogue systems. Explicit satisfaction ratings can be elicited from users, but users often do not provide ratings when asked, and those they give can be highly subjective. Post-hoc ratings by experts are an alternative, but these can be both expensive and complex to collect. Here, we explore the creation of automated methods for predicting both expert and user ratings of open-domain dialogues. We compare four different approaches. First, we train a baseline model using an end-to-end transformer to predict ratings directly from the raw dialogue text. The other three methods are variants of a two-stage approach in which we first extract interpretable features at the turn level that capture, among other aspects, user dialogue behaviors indicating contradiction, repetition, disinterest, compliments, or criticism. We project these features to the dialogue level and train a dialogue-level MLP regression model, a dialogue-level LSTM, and a novel causal inference model called counterfactual-LSTM (CF-LSTM) to predict ratings. The proposed CF-LSTM is a sequential model over turn-level features which predicts ratings using multiple regressors depending on hypotheses derived from the turn-level features. As a causal inference model, CF-LSTM aims to learn the underlying causes of a specific event, such as a low rating. We also bin the user ratings and perform classification experiments with all four models. In evaluation experiments on conversational data from the Alexa Prize SocialBot, we show that the CF-LSTM achieves the best performance for predicting dialogue ratings and classification. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Comments: Accepted as a conference paper at IWSDS 2023

arXiv:2207.05643 [pdf, other]

SafeDrones: Real-Time Reliability Evaluation of UAVs using Executable Digital Dependable Identities

Authors: Koorosh Aslansefat, Panagiota Nikolaou, Martin Walker, Mohammed Naveed Akram, Ioannis Sorokos, Jan Reich, Panayiotis Kolios, Maria K. Michael, Theocharis Theocharides, Georgios Ellinas, Daniel Schneider, Yiannis Papadopoulos

Abstract: The use of Unmanned Arial Vehicles (UAVs) offers many advantages across a variety of applications. However, safety assurance is a key barrier to widespread usage, especially given the unpredictable operational and environmental factors experienced by UAVs, which are hard to capture solely at design-time. This paper proposes a new reliability modeling approach called SafeDrones to help address this… ▽ More The use of Unmanned Arial Vehicles (UAVs) offers many advantages across a variety of applications. However, safety assurance is a key barrier to widespread usage, especially given the unpredictable operational and environmental factors experienced by UAVs, which are hard to capture solely at design-time. This paper proposes a new reliability modeling approach called SafeDrones to help address this issue by enabling runtime reliability and risk assessment of UAVs. It is a prototype instantiation of the Executable Digital Dependable Identity (EDDI) concept, which aims to create a model-based solution for real-time, data-driven dependability assurance for multi-robot systems. By providing real-time reliability estimates, SafeDrones allows UAVs to update their missions accordingly in an adaptive manner. △ Less

Submitted 12 July, 2022; originally announced July 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2202.11249 [pdf, other]

Virtual, Augmented, and Mixed Reality for Human-Robot Interaction: A Survey and Virtual Design Element Taxonomy

Authors: Michael Walker, Thao Phung, Tathagata Chakraborti, Tom Williams, Daniel Szafir

Abstract: Virtual, Augmented, and Mixed Reality for Human-Robot Interaction (VAM-HRI) has been gaining considerable attention in research in recent years. However, the HRI community lacks a set of shared terminology and framework for characterizing aspects of mixed reality interfaces, presenting serious problems for future research. Therefore, it is important to have a common set of terms and concepts that… ▽ More Virtual, Augmented, and Mixed Reality for Human-Robot Interaction (VAM-HRI) has been gaining considerable attention in research in recent years. However, the HRI community lacks a set of shared terminology and framework for characterizing aspects of mixed reality interfaces, presenting serious problems for future research. Therefore, it is important to have a common set of terms and concepts that can be used to precisely describe and organize the diverse array of work being done within the field. In this paper, we present a novel taxonomic framework for different types of VAM-HRI interfaces, composed of four main categories of virtual design elements (VDEs). We present and justify our taxonomy and explain how its elements have been developed over the last 30 years as well as the current directions VAM-HRI is headed in the coming decade. △ Less

Submitted 22 February, 2022; originally announced February 2022.

Comments: Explore contents at ibm.biz/vam-hri

arXiv:2111.02519 [pdf, other]

Athena 2.0: Contextualized Dialogue Management for an Alexa Prize SocialBot

Authors: Juraj Juraska, Kevin K. Bowden, Lena Reed, Vrindavan Harrison, Wen Cui, Omkar Patil, Rishi Rajasekaran, Angela Ramirez, Cecilia Li, Eduardo Zamora, Phillip Lee, Jeshwanth Bheemanpally, Rohan Pandey, Adwait Ratnaparkhi, Marilyn Walker

Abstract: Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges. One reason for Athena's success is its novel dialogue management strategy, which allows it to dynamically construct dialogues and responses from component modules, leading to novel conversations with every interaction. Here we describe Athena's system design and performance in the Alexa Pr… ▽ More Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges. One reason for Athena's success is its novel dialogue management strategy, which allows it to dynamically construct dialogues and responses from component modules, leading to novel conversations with every interaction. Here we describe Athena's system design and performance in the Alexa Prize during the 20/21 competition. A live demo of Athena as well as video recordings will provoke discussion on the state of the art in conversational AI. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: Accepted to EMNLP 2021 System Demonstrations

arXiv:2110.11164 [pdf, other]

Modeling Performance in Open-Domain Dialogue with PARADISE

Authors: Marilyn Walker, Colin Harmon, James Graupera, Davan Harrison, Steve Whittaker

Abstract: There has recently been an explosion of work on spoken dialogue systems, along with an increased interest in open-domain systems that engage in casual conversations on popular topics such as movies, books and music. These systems aim to socially engage, entertain, and even empathize with their users. Since the achievement of such social goals is hard to measure, recent research has used dialogue l… ▽ More There has recently been an explosion of work on spoken dialogue systems, along with an increased interest in open-domain systems that engage in casual conversations on popular topics such as movies, books and music. These systems aim to socially engage, entertain, and even empathize with their users. Since the achievement of such social goals is hard to measure, recent research has used dialogue length or human ratings as evaluation metrics, and developed methods for automatically calculating novel metrics, such as coherence, consistency, relevance and engagement. Here we develop a PARADISE model for predicting the performance of Athena, a dialogue system that has participated in thousands of conversations with real users, while competing as a finalist in the Alexa Prize. We use both user ratings and dialogue length as metrics for dialogue quality, and experiment with predicting these metrics using automatic features that are both system dependent and independent. Our goal is to learn a general objective function that can be used to optimize the dialogue choices of any Alexa Prize system in real time and evaluate its performance. Our best model for predicting user ratings gets an R$^2$ of .136 with a DistilBert model, and the best model for predicting length with system independent features gets an R$^2$ of .865, suggesting that conversation length may be a more reliable measure for automatic training of dialogue systems. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Comments: The 12th International Workshop on Spoken Dialog System Technology, November 2021

arXiv:2110.08094 [pdf, other]

Jurassic is (almost) All You Need: Few-Shot Meaning-to-Text Generation for Open-Domain Dialogue

Authors: Lena Reed, Cecilia Li, Angela Ramirez, Liren Wu, Marilyn Walker

Abstract: One challenge with open-domain dialogue systems is the need to produce truthful, high-quality responses on any topic. We aim to improve the quality and coverage of Athena, an Alexa Prize dialogue system. We experiment with few-shot prompt-based learning, comparing GPT-Neo to Jurassic-1, for the movies, music, TV, sports, and video game domains, both within and cross-domain, with different prompt s… ▽ More One challenge with open-domain dialogue systems is the need to produce truthful, high-quality responses on any topic. We aim to improve the quality and coverage of Athena, an Alexa Prize dialogue system. We experiment with few-shot prompt-based learning, comparing GPT-Neo to Jurassic-1, for the movies, music, TV, sports, and video game domains, both within and cross-domain, with different prompt set sizes (2, 3, 10), formats, and meaning representations consisting of either sets of WikiData KG triples, or dialogue acts. Our evaluation uses BLEURT and human metrics, and shows that with 10-shot prompting, Athena-Jurassic's performance is significantly better for coherence and semantic accuracy. Experiments with 2-shot cross-domain prompts results in a huge performance drop for Athena-GPT-Neo, whose semantic accuracy falls to 0.41, and whose untrue hallucination rate increases to 12%. Experiments with dialogue acts for video games show that with 10-shot prompting, both models learn to control dialogue acts, but Athena-Jurassic has significantly higher coherence, and only 4% untrue hallucinations. Our results suggest that Athena-Jurassic produces high enough quality outputs to be useful in live systems with real users. To our knowledge, these are the first results demonstrating that few-shot semantic prompt-based learning can create NLGs that generalize to new domains, and produce high-quality, semantically-controlled, conversational responses directly from meaning representations. △ Less

Submitted 10 November, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: Final Conference Proceedings version

Journal ref: The 12th International Workshop on Spoken Dialog System Technology, IWSDS 2021

arXiv:2109.07043 [pdf, other]

Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG

Authors: Juraj Juraska, Marilyn Walker

Abstract: Ever since neural models were adopted in data-to-text language generation, they have invariably been reliant on extrinsic components to improve their semantic accuracy, because the models normally do not exhibit the ability to generate text that reliably mentions all of the information provided in the input. In this paper, we propose a novel decoding method that extracts interpretable information… ▽ More Ever since neural models were adopted in data-to-text language generation, they have invariably been reliant on extrinsic components to improve their semantic accuracy, because the models normally do not exhibit the ability to generate text that reliably mentions all of the information provided in the input. In this paper, we propose a novel decoding method that extracts interpretable information from encoder-decoder models' cross-attention, and uses it to infer which attributes are mentioned in the generated text, which is subsequently used to rescore beam hypotheses. Using this decoding method with T5 and BART, we show on three datasets its ability to dramatically reduce semantic errors in the generated outputs, while maintaining their state-of-the-art quality. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: Accepted to INLG 2021

arXiv:2108.03477 [pdf, other]

doi 10.1109/MRA.2021.3138383

A Tool for Organizing Key Characteristics of Virtual, Augmented, and Mixed Reality for Human-Robot Interaction Systems: Synthesizing VAM-HRI Trends and Takeaways

Authors: Thomas R. Groechel, Michael E. Walker, Christine T. Chang, Eric Rosen, Jessica Zosa Forde

Abstract: Frameworks have begun to emerge to categorize Virtual, Augmented, and Mixed Reality (VAM) technologies that provide immersive, intuitive interfaces to facilitate Human-Robot Interaction. These frameworks, however, fail to capture key characteristics of the growing subfield of VAM-HRI and can be difficult to consistently apply due to continuous scales. This work builds upon these prior frameworks t… ▽ More Frameworks have begun to emerge to categorize Virtual, Augmented, and Mixed Reality (VAM) technologies that provide immersive, intuitive interfaces to facilitate Human-Robot Interaction. These frameworks, however, fail to capture key characteristics of the growing subfield of VAM-HRI and can be difficult to consistently apply due to continuous scales. This work builds upon these prior frameworks through the creation of a Tool for Organizing Key Characteristics of VAM-HRI Systems (TOKCS). TOKCS discretizes the continuous scales used within prior works for more consistent classification and adds additional characteristics related to a robot's internal model, anchor locations, manipulability, and the system's software and hardware. To showcase the tool's capability, TOKCS is applied to the ten papers from the fourth VAM-HRI workshop and examined for key trends and takeaways. These trends highlight the expressive capability of TOKCS while also helping frame newer trends and future work recommendations for VAM-HRI research. △ Less

Submitted 10 February, 2022; v1 submitted 7 August, 2021; originally announced August 2021.

Comments: Accepted to Robotics and Automation Magazine Special Issue on Extended Reality in Robotics

arXiv:2011.10683 [pdf, other]

Athena: Constructing Dialogues Dynamically with Discourse Constraints

Authors: Vrindavan Harrison, Juraj Juraska, Wen Cui, Lena Reed, Kevin K. Bowden, Jiaqi Wu, Brian Schwarzmann, Abteen Ebrahimi, Rishi Rajasekaran, Nikhil Varghese, Max Wechsler-Azen, Steve Whittaker, Jeffrey Flanigan, Marilyn Walker

Abstract: This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena's dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response gen… ▽ More This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena's dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response generators. This allows Athena to procure responses from dynamic sources, such as knowledge graph traversals and feature-based on-the-fly response retrieval methods. After describing the dialogue system architecture, we perform an analysis of conversations that Athena participated in during the 2019 Alexa Prize Competition. We conclude with a report on several user studies we carried out to better understand how individual user characteristics affect system ratings. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Comments: 3rd Proceedings of Alexa Prize (Alexa Prize 2019)

arXiv:2010.00150 [pdf, other]

Learning from Mistakes: Combining Ontologies via Self-Training for Dialogue Generation

Authors: Lena Reed, Vrindavan Harrison, Shereen Oraby, Dilek Hakkani-Tur, Marilyn Walker

Abstract: Natural language generators (NLGs) for task-oriented dialogue typically take a meaning representation (MR) as input. They are trained end-to-end with a corpus of MR/utterance pairs, where the MRs cover a specific set of dialogue acts and domain attributes. Creation of such datasets is labor-intensive and time-consuming. Therefore, dialogue systems for new domain ontologies would benefit from using… ▽ More Natural language generators (NLGs) for task-oriented dialogue typically take a meaning representation (MR) as input. They are trained end-to-end with a corpus of MR/utterance pairs, where the MRs cover a specific set of dialogue acts and domain attributes. Creation of such datasets is labor-intensive and time-consuming. Therefore, dialogue systems for new domain ontologies would benefit from using data for pre-existing ontologies. Here we explore, for the first time, whether it is possible to train an NLG for a new larger ontology using existing training sets for the restaurant domain, where each set is based on a different ontology. We create a new, larger combined ontology, and then train an NLG to produce utterances covering it. For example, if one dataset has attributes for family-friendly and rating information, and the other has attributes for decor and service, our aim is an NLG for the combined ontology that can produce utterances that realize values for family-friendly, rating, decor and service. Initial experiments with a baseline neural sequence-to-sequence model show that this task is surprisingly challenging. We then develop a novel self-training method that identifies (errorful) model outputs, automatically constructs a corrected MR input to form a new (MR, utterance) training pair, and then repeatedly adds these new instances back into the training data. We then test the resulting model on a new test set. The result is a self-trained model whose performance is an absolute 75.4% improvement over the baseline model. We also report a human qualitative evaluation of the final model showing that it achieves high naturalness, semantic coherence and grammaticality △ Less

Submitted 30 September, 2020; originally announced October 2020.

Comments: main paper 9 pages, 3 pages references, 2 pages supplementary material

arXiv:2006.09554 [pdf, other]

Isometric Graph Neural Networks

Authors: Matthew Walker, Bo Yan, Yiou Xiao, Yafei Wang, Ayan Acharya

Abstract: Many tasks that rely on representations of nodes in graphs would benefit if those representations were faithful to distances between nodes in the graph. Geometric techniques to extract such representations have poor scaling over large graph size, and recent advances in Graph Neural Network (GNN) algorithms have limited ability to reflect graph distance information beyond the first degree neighborh… ▽ More Many tasks that rely on representations of nodes in graphs would benefit if those representations were faithful to distances between nodes in the graph. Geometric techniques to extract such representations have poor scaling over large graph size, and recent advances in Graph Neural Network (GNN) algorithms have limited ability to reflect graph distance information beyond the first degree neighborhood. To enable this highly desired capability, we propose a technique to learn Isometric Graph Neural Networks (IGNN), which requires changing the input representation space and loss function to enable any GNN algorithm to generate representations that reflect distances between nodes. We experiment with the isometric technique on several GNN architectures for modeling multiple prediction tasks on multiple datasets. In addition to an improvement in AUC-ROC as high as $43\%$ in these experiments, we observe a consistent and substantial improvement as high as 400% in Kendall's Tau (KT), a measure that directly reflects distance information, demonstrating that the learned embeddings do account for graph distances. △ Less

Submitted 16 June, 2020; originally announced June 2020.

arXiv:2002.12228 [pdf, other]

Exploiting Colorimetry for Fidelity in Data Visualization

Authors: M. J. Waters, J. M. Walker, C. T. Nelson, D. Joester, J. M. Rondinelli

Abstract: Advances in multimodal characterization methods fuel a generation of increasing immense hyper-dimensional datasets. Color mapping is employed for conveying higher dimensional data in two-dimensional (2D) representations for human consumption without relying on multiple projections. How one constructs these color maps, however, critically affects how accurately one perceives data. For simple scalar… ▽ More Advances in multimodal characterization methods fuel a generation of increasing immense hyper-dimensional datasets. Color mapping is employed for conveying higher dimensional data in two-dimensional (2D) representations for human consumption without relying on multiple projections. How one constructs these color maps, however, critically affects how accurately one perceives data. For simple scalar fields, perceptually uniform color maps and color selection have been shown to improve data readability and interpretation across research fields. Here we review core concepts underlying the design of perceptually uniform color map and extend the concepts from scalar fields to two-dimensional vector fields and three-component composition fields frequently found in materials-chemistry research to enable high-fidelity visualization. We develop the software tools PAPUC and CMPUC to enable researchers to utilize these colorimetry principles and employ perceptually uniform color spaces for rigorously meaningful color mapping of higher dimensional data representations. Last, we demonstrate how these approaches deliver immediate improvements in data readability and interpretation in microscopies and spectroscopies routinely used in discerning materials structure, chemistry, and properties. △ Less

Submitted 27 February, 2020; originally announced February 2020.

arXiv:1912.06981 [pdf, ps, other]

Local Parametric Surface Approximation With Automatic Order Selection From Position Data

Authors: Michael R. Walker II

Abstract: Acquiring an anatomical map from position data is important for medical applications where catheters interact with soft tissues. To improve autonomous navigation in these settings, we require information beyond nonparametric maps typically available. We present an algorithm for local surface approximation from position data with automatic surface order selection. The traditional surface fitting ob… ▽ More Acquiring an anatomical map from position data is important for medical applications where catheters interact with soft tissues. To improve autonomous navigation in these settings, we require information beyond nonparametric maps typically available. We present an algorithm for local surface approximation from position data with automatic surface order selection. The traditional surface fitting objective function is derived from a Bayesian perspective. Posterior probabilities from the occupancy map are incorporated as weights on points selected for surface fitting. Our novel iterative algorithm incorporates surface order selection using the Bayesian information criterion. Simulations demonstrate the ability to automatically select surface order consistent with the latent surface in the presence of noise. Results on human procedure data are also presented. △ Less

Submitted 10 July, 2020; v1 submitted 15 December, 2019; originally announced December 2019.

Comments: Accepted for publication in the 2020 International Symposium on Medical Robotics (ISMR)

arXiv:1911.05465 [pdf, other]

Relation Learning on Social Networks with Multi-Modal Graph Edge Variational Autoencoders

Authors: Carl Yang, Jieyu Zhang, Haonan Wang, Sha Li, Myungwan Kim, Matt Walker, Yiou Xiao, Jiawei Han

Abstract: While node semantics have been extensively explored in social networks, little research attention has been paid to profile edge semantics, i.e., social relations. Ideal edge semantics should not only show that two users are connected, but also why they know each other and what they share in common. However, relations in social networks are often hard to profile, due to noisy multi-modal signals an… ▽ More While node semantics have been extensively explored in social networks, little research attention has been paid to profile edge semantics, i.e., social relations. Ideal edge semantics should not only show that two users are connected, but also why they know each other and what they share in common. However, relations in social networks are often hard to profile, due to noisy multi-modal signals and limited user-generated ground-truth labels. In this work, we aim to develop a unified and principled framework that can profile user relations as edge semantics in social networks by integrating multi-modal signals in the presence of noisy and incomplete data. Our framework is also flexible towards limited or missing supervision. Specifically, we assume a latent distribution of multiple relations underlying each user link, and learn them with multi-modal graph edge variational autoencoders. We encode the network data with a graph convolutional network, and decode arbitrary signals with multiple reconstruction networks. Extensive experiments and case studies on two public DBLP author networks and two internal LinkedIn member networks demonstrate the superior effectiveness and efficiency of our proposed model. △ Less

Submitted 4 November, 2019; originally announced November 2019.

Comments: To appear in WSDM 2020

arXiv:1910.13000 [pdf, other]

Human-centered Control of a Growing Soft Robot for Object Manipulation

Authors: Fabio Stroppa, Ming Luo, Giada Gerboni, Margaret M. Coad, Julie M. Walker, Allison M. Okamura

Abstract: We present a user-friendly interface to teleoperate a soft robot manipulator in a complex environment. Key components of the system include a manipulator with a grasping end-effector that grows via tip eversion, gesture-based control, and haptic display to the operator for feedback and guidance. In the initial work, the operator uses the soft robot to build a tower of blocks, and future works will… ▽ More We present a user-friendly interface to teleoperate a soft robot manipulator in a complex environment. Key components of the system include a manipulator with a grasping end-effector that grows via tip eversion, gesture-based control, and haptic display to the operator for feedback and guidance. In the initial work, the operator uses the soft robot to build a tower of blocks, and future works will extend this to shared autonomy scenarios in which the human operator and robot intelligence are both necessary for task completion. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1910.12129 [pdf, other]

ViGGO: A Video Game Corpus for Data-To-Text Generation in Open-Domain Conversation

Authors: Juraj Juraska, Kevin K. Bowden, Marilyn Walker

Abstract: The uptake of deep learning in natural language generation (NLG) led to the release of both small and relatively large parallel corpora for training neural models. The existing data-to-text datasets are, however, aimed at task-oriented dialogue systems, and often thus limited in diversity and versatility. They are typically crowdsourced, with much of the noise left in them. Moreover, current neura… ▽ More The uptake of deep learning in natural language generation (NLG) led to the release of both small and relatively large parallel corpora for training neural models. The existing data-to-text datasets are, however, aimed at task-oriented dialogue systems, and often thus limited in diversity and versatility. They are typically crowdsourced, with much of the noise left in them. Moreover, current neural NLG models do not take full advantage of large training data, and due to their strong generalizing properties produce sentences that look template-like regardless. We therefore present a new corpus of 7K samples, which (1) is clean despite being crowdsourced, (2) has utterances of 9 generalizable and conversational dialogue act types, making it more suitable for open-domain dialogue systems, and (3) explores the domain of video games, which is new to dialogue systems despite having excellent potential for supporting rich conversations. △ Less

Submitted 26 October, 2019; originally announced October 2019.

Comments: Accepted to INLG 2019

arXiv:1910.10542 [pdf, other]

doi 10.1007/978-3-030-32486-5_15

Deep generative model-driven multimodal prostate segmentation in radiotherapy

Authors: Kibrom Berihu Girum, Gilles Créhange, Raabid Hussain, Paul Michael Walker, Alain Lalande

Abstract: Deep learning has shown unprecedented success in a variety of applications, such as computer vision and medical image analysis. However, there is still potential to improve segmentation in multimodal images by embedding prior knowledge via learning-based shape modeling and registration to learn the modality invariant anatomical structure of organs. For example, in radiotherapy automatic prostate s… ▽ More Deep learning has shown unprecedented success in a variety of applications, such as computer vision and medical image analysis. However, there is still potential to improve segmentation in multimodal images by embedding prior knowledge via learning-based shape modeling and registration to learn the modality invariant anatomical structure of organs. For example, in radiotherapy automatic prostate segmentation is essential in prostate cancer diagnosis, therapy, and post-therapy assessment from T2-weighted MR or CT images. In this paper, we present a fully automatic deep generative model-driven multimodal prostate segmentation method using convolutional neural network (DGMNet). The novelty of our method comes with its embedded generative neural network for learning-based shape modeling and its ability to adapt for different imaging modalities via learning-based registration. The proposed method includes a multi-task learning framework that combines a convolutional feature extraction and an embedded regression and classification based shape modeling. This enables the network to predict the deformable shape of an organ. We show that generative neural networkbased shape modeling trained on a reliable contrast imaging modality (such as MRI) can be directly applied to low contrast imaging modality (such as CT) to achieve accurate prostate segmentation. The method was evaluated on MRI and CT datasets acquired from different clinical centers with large variations in contrast and scanning protocols. Experimental results reveal that our method can be used to automatically and accurately segment the prostate gland in different imaging modalities. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: 8 pages, camera ready paper, accepted for Artificial Intelligence in Radiation Therapy (AIRT), in conjunction with MICCAI 2019

arXiv:1910.09290 [pdf, other]

A full scale atmospheric flight experimental research environment for the Mars helicopter

Authors: J. Pablo Afman, Eric Feron, Mitchell Walker

Abstract: We propose to develop a full-accuracy flight test environment for the Mars helicopter and related Mars-atmospheric vehicles. The experiment would use reduced-g atmospheric flights with an aircraft that houses a properly sized vacuum chamber. We propose to develop a full-accuracy flight test environment for the Mars helicopter and related Mars-atmospheric vehicles. The experiment would use reduced-g atmospheric flights with an aircraft that houses a properly sized vacuum chamber. △ Less

Submitted 16 September, 2019; originally announced October 2019.

Comments: 8 pages

arXiv:1909.01584 [pdf, other]

Discovering Hypernymy in Text-Rich Heterogeneous Information Network by Exploiting Context Granularity

Authors: Yu Shi, Jiaming Shen, Yuchen Li, Naijing Zhang, Xinwei He, Zhengzhi Lou, Qi Zhu, Matthew Walker, Myunghwan Kim, Jiawei Han

Abstract: Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distri… ▽ More Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of context to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine. △ Less

Submitted 4 September, 2019; originally announced September 2019.

Comments: CIKM 2019

arXiv:1908.04832 [pdf, other]

doi 10.1145/3342775.3342792

Entertaining and Opinionated but Too Controlling: A Large-Scale User Study of an Open Domain Alexa Prize System

Authors: Kevin K. Bowden, Jiaqi Wu, Wen Cui, Juraj Juraska, Vrindavan Harrison, Brian Schwarzmann, Nicholas Santer, Steve Whittaker, Marilyn Walker

Abstract: Conversational systems typically focus on functional tasks such as scheduling appointments or creating todo lists. Instead we design and evaluate SlugBot (SB), one of 8 semifinalists in the 2018 AlexaPrize, whose goal is to support casual open-domain social inter-action. This novel application requires both broad topic coverage and engaging interactive skills. We developed a new technical approach… ▽ More Conversational systems typically focus on functional tasks such as scheduling appointments or creating todo lists. Instead we design and evaluate SlugBot (SB), one of 8 semifinalists in the 2018 AlexaPrize, whose goal is to support casual open-domain social inter-action. This novel application requires both broad topic coverage and engaging interactive skills. We developed a new technical approach to meet this demanding situation by crowd-sourcing novel content and introducing playful conversational strategies based on storytelling and games. We collected over 10,000 conversations during August 2018 as part of the Alexa Prize competition. We also conducted an in-lab follow-up qualitative evaluation. Over-all users found SB moderately engaging; conversations averaged 3.6 minutes and involved 26 user turns. However, users reacted very differently to different conversation subtypes. Storytelling and games were evaluated positively; these were seen as entertaining with predictable interactive structure. They also led users to impute personality and intelligence to SB. In contrast, search and general Chit-Chat induced coverage problems; here users found it hard to infer what topics SB could understand, with these conversations seen as being too system-driven. Theoretical and design implications suggest a move away from conversational systems that simply provide factual information. Future systems should be designed to have their own opinions with personal stories to share, and SB provides an example of how we might achieve this. △ Less

Submitted 13 August, 2019; originally announced August 2019.

Comments: To appear in 1st International Conference on Conversational User Interfaces (CUI 2019)

arXiv:1907.10658 [pdf, other]

doi 10.13140/RG.2.2.33543.96166

SlugBot: Developing a Computational Model andFramework of a Novel Dialogue Genre

Authors: Kevin K. Bowden, Jiaqi Wu, Wen Cui, Juraj Juraska, Vrindavan Harrison, Brian Schwarzmann, Nick Santer, Marilyn Walker

Abstract: One of the most interesting aspects of the Amazon Alexa Prize competition is that the framing of the competition requires the development of new computational models of dialogue and its structure. Traditional computational models of dialogue are of two types: (1) task-oriented dialogue, supported by AI planning models,or simplified planning models consisting of frames with slots to be filled; or (… ▽ More One of the most interesting aspects of the Amazon Alexa Prize competition is that the framing of the competition requires the development of new computational models of dialogue and its structure. Traditional computational models of dialogue are of two types: (1) task-oriented dialogue, supported by AI planning models,or simplified planning models consisting of frames with slots to be filled; or (2)search-oriented dialogue where every user turn is treated as a search query that may elaborate and extend current search results. Alexa Prize dialogue systems such as SlugBot must support conversational capabilities that go beyond what these traditional models can do. Moreover, while traditional dialogue systems rely on theoretical computational models, there are no existing computational theories that circumscribe the expected system and user behaviors in the intended conversational genre of the Alexa Prize Bots. This paper describes how UCSC's SlugBot team has combined the development of a novel computational theoretical model, Discourse Relation Dialogue Model, with its implementation in a modular system in order to test and refine it. We highlight how our novel dialogue model has led us to create a novel ontological resource, UniSlug, and how the structure of UniSlug determine show we curate and structure content so that our dialogue manager implements and tests our novel computational dialogue model. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: arXiv admin note: text overlap with arXiv:1801.01531

arXiv:1907.09527 [pdf, other]

Maximizing Stylistic Control and Semantic Accuracy in NLG: Personality Variation and Discourse Contrast

Authors: Vrindavan Harrison, Lena Reed, Shereen Oraby, Marilyn Walker

Abstract: Neural generation methods for task-oriented dialogue typically generate from a meaning representation that is populated using a database of domain information, such as a table of data describing a restaurant. While earlier work focused solely on the semantic fidelity of outputs, recent work has started to explore methods for controlling the style of the generated text while simultaneously achievin… ▽ More Neural generation methods for task-oriented dialogue typically generate from a meaning representation that is populated using a database of domain information, such as a table of data describing a restaurant. While earlier work focused solely on the semantic fidelity of outputs, recent work has started to explore methods for controlling the style of the generated text while simultaneously achieving semantic accuracy. Here we experiment with two stylistic benchmark tasks, generating language that exhibits variation in personality, and generating discourse contrast. We report a huge performance improvement in both stylistic control and semantic accuracy over the state of the art on both of these benchmarks. We test several different models and show that putting stylistic conditioning in the decoder and eliminating the semantic re-ranker used in earlier models results in more than 15 points higher BLEU for Personality, with a reduction of semantic error to near zero. We also report an improvement from .75 to .81 in controlling contrast and a reduction in semantic error from 16% to 2%. △ Less

Submitted 22 July, 2019; originally announced July 2019.

arXiv:1907.03975 [pdf, ps, other]

Implicit Discourse Relation Identification for Open-domain Dialogues

Authors: Mingyu Derek Ma, Kevin K. Bowden, Jiaqi Wu, Wen Cui, Marilyn Walker

Abstract: Discourse relation identification has been an active area of research for many years, and the challenge of identifying implicit relations remains largely an unsolved task, especially in the context of an open-domain dialogue system. Previous work primarily relies on a corpora of formal text which is inherently non-dialogic, i.e., news and journals. This data however is not suitable to handle the n… ▽ More Discourse relation identification has been an active area of research for many years, and the challenge of identifying implicit relations remains largely an unsolved task, especially in the context of an open-domain dialogue system. Previous work primarily relies on a corpora of formal text which is inherently non-dialogic, i.e., news and journals. This data however is not suitable to handle the nuances of informal dialogue nor is it capable of navigating the plethora of valid topics present in open-domain dialogue. In this paper, we designed a novel discourse relation identification pipeline specifically tuned for open-domain dialogue systems. We firstly propose a method to automatically extract the implicit discourse relation argument pairs and labels from a dataset of dialogic turns, resulting in a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse. Moreover, we have taken the first steps to leverage the dialogue features unique to our task to further improve the identification of such relations by performing feature ablation and incorporating dialogue features to enhance the state-of-the-art model. △ Less

Submitted 8 July, 2019; originally announced July 2019.

Comments: To appear in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL2019)

arXiv:1906.01334 [pdf, other]

Curate and Generate: A Corpus and Method for Joint Control of Semantics and Style in Neural NLG

Authors: Shereen Oraby, Vrindavan Harrison, Abteen Ebrahimi, Marilyn Walker

Abstract: Neural natural language generation (NNLG) from structured meaning representations has become increasingly popular in recent years. While we have seen progress with generating syntactically correct utterances that preserve semantics, various shortcomings of NNLG systems are clear: new tasks require new training data which is not available or straightforward to acquire, and model outputs are simple… ▽ More Neural natural language generation (NNLG) from structured meaning representations has become increasingly popular in recent years. While we have seen progress with generating syntactically correct utterances that preserve semantics, various shortcomings of NNLG systems are clear: new tasks require new training data which is not available or straightforward to acquire, and model outputs are simple and may be dull and repetitive. This paper addresses these two critical challenges in NNLG by: (1) scalably (and at no cost) creating training datasets of parallel meaning representations and reference texts with rich style markup by using data from freely available and naturally descriptive user reviews, and (2) systematically exploring how the style markup enables joint control of semantic and stylistic aspects of neural model output. We present YelpNLG, a corpus of 300,000 rich, parallel meaning representations and highly stylistically varied reference texts spanning different restaurant attributes, and describe a novel methodology that can be scalably reused to generate NLG datasets for other domains. The experiments show that the models control important aspects, including lexical choice of adjectives, output length, and sentiment, allowing the models to successfully hit multiple style targets without sacrificing semantics. △ Less

Submitted 14 June, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: To appear at ACL 19. 9 content pages, 3 appendix pages

arXiv:1904.00510 [pdf]

doi 10.31256/HSMR2019.9

How to enhance learning of robotic surgery gestures? A tactile cue saliency investigation for 3D hand guidance

Authors: Gustavo D. Gil, Julie M. Walker, Nabil Zemiti, Allison M. Okamura, Philippe Poignet

Abstract: The current generation of surgeons requires extensive training in teleoperation to develop specific dexterous skills, which are independent of medical knowledge. Training curricula progress from manipulation tasks to simulated surgical tasks but are limited in time. To tackle this, we propose to integrate surgical robotic training together with Haptic Feedback (HF) to improve skill acquisition. Th… ▽ More The current generation of surgeons requires extensive training in teleoperation to develop specific dexterous skills, which are independent of medical knowledge. Training curricula progress from manipulation tasks to simulated surgical tasks but are limited in time. To tackle this, we propose to integrate surgical robotic training together with Haptic Feedback (HF) to improve skill acquisition. This paper present the initial but promising results of our haptic device designed to support in the training of surgical gestures. Our ongoing work is related to integrate the HF in the RAVEN II platform. △ Less

Submitted 19 July, 2019; v1 submitted 31 March, 2019; originally announced April 2019.

Comments: HSMR: 12th Hamlyn Symposium on Medical Robotics (London, 24th-26th June 2019)

arXiv:1903.12271 [pdf]

In Search of Meaning: Lessons, Resources and Next Steps for Computational Analysis of Financial Discourse

Authors: Mahmoud El-Haj, Paul Rayson, Martin Walker, Steven Young, Vasiliki Simaki

Abstract: We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse. We also review common themes and innovations in the literature and assess the incremental contributions of work applying CL methods over manual content analysis. Key conclusions emerging from our analysis are: (a) accounting and finance research is behi… ▽ More We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse. We also review common themes and innovations in the literature and assess the incremental contributions of work applying CL methods over manual content analysis. Key conclusions emerging from our analysis are: (a) accounting and finance research is behind the curve in terms of CL methods generally and word sense disambiguation in particular; (b) implementation issues mean the proposed benefits of CL are often less pronounced than proponents suggest; (c) structural issues limit practical relevance; and (d) CL methods and high quality manual analysis represent complementary approaches to analyzing financial discourse. We describe four CL tools that have yet to gain traction in mainstream AF research but which we believe offer promising ways to enhance the study of meaning in financial discourse. The four tools are named entity recognition (NER), summarization, semantics and corpus linguistics. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Comments: 70 page, 18 pages of references, Journal Article

arXiv:1903.03150 [pdf, other]

Holdable Haptic Device for 4-DOF Motion Guidance

Authors: Julie M. Walker, Nabil Zemiti, Philippe Poignet, Allison M. Okamura

Abstract: Hand-held haptic devices can allow for greater freedom of motion and larger workspaces than traditional grounded haptic devices. They can also provide more compelling haptic sensations to the users' fingertips than many wearable haptic devices because reaction forces can be distributed over a larger area of skin far away from the stimulation site. This paper presents a hand-held kinesthetic grippe… ▽ More Hand-held haptic devices can allow for greater freedom of motion and larger workspaces than traditional grounded haptic devices. They can also provide more compelling haptic sensations to the users' fingertips than many wearable haptic devices because reaction forces can be distributed over a larger area of skin far away from the stimulation site. This paper presents a hand-held kinesthetic gripper that provides guidance cues in four degrees of freedom (DOF). 2-DOF tangential forces on the thumb and index finger combine to create cues to translate or rotate the hand. We demonstrate the device's capabilities in a three-part user study. First, users moved their hands in response to haptic cues before receiving instruction or training. Then, they trained on cues in eight directions in a forced-choice task. Finally, they repeated the first part, now knowing what each cue intended to convey. Users were able to discriminate each cue over 90% of the time. Users moved correctly in response to the guidance cues both before and after the training and indicated that the cues were easy to follow. The results show promise for holdable kinesthetic devices in haptic feedback and guidance for applications such as virtual reality, medical training, and teleoperation. △ Less

Submitted 7 March, 2019; originally announced March 2019.

Comments: Submitted to IEEE World Haptics Conference 2019

arXiv:1902.06024 [pdf, other]

CruzAffect at AffCon 2019 Shared Task: A feature-rich approach to characterize happiness

Authors: Jiaqi Wu, Ryan Compton, Geetanjali Rakshit, Marilyn Walker, Pranav Anand, Steve Whittaker

Abstract: We present our system, CruzAffect, for the CL-Aff Shared Task 2019. CruzAffect consists of several types of robust and efficient models for affective classification tasks. We utilize both traditional classifiers, such as XGBoosted Forest, as well as a deep learning Convolutional Neural Networks (CNN) classifier. We explore rich feature sets such as syntactic features, emotional features, and profi… ▽ More We present our system, CruzAffect, for the CL-Aff Shared Task 2019. CruzAffect consists of several types of robust and efficient models for affective classification tasks. We utilize both traditional classifiers, such as XGBoosted Forest, as well as a deep learning Convolutional Neural Networks (CNN) classifier. We explore rich feature sets such as syntactic features, emotional features, and profile features, and utilize several sentiment lexicons, to discover essential indicators of social involvement and control that a subject might exercise in their happy moments, as described in textual snippets from the HappyDB database. The data comes with a labeled set (10K), and a larger unlabeled set (70K). We therefore use supervised methods on the 10K dataset, and a bootstrapped semi-supervised approach for the 70K. We evaluate these models for binary classification of agency and social labels (Task 1), as well as multi-class prediction for concepts labels (Task 2). We obtain promising results on the held-out data, suggesting that the proposed feature sets effectively represent the data for affective classification tasks. We also build concepts models that discover general themes recurring in happy moments. Our results indicate that generic characteristics are shared between the classes of agency, social and concepts, suggesting it should be possible to build general models for affective classification tasks. △ Less

Submitted 15 February, 2019; originally announced February 2019.

Comments: Workshop on Affective Content Analysis (AffCon) 2019, Workshop of Association for the Advancement of Artificial Intelligence (AAAI) 2019, Hawaii, USA January 2019

arXiv:1901.11129 [pdf, other]

Generic Connectivity-Based CGRA Mapping via Integer Linear Programming

Authors: Matthew J. P. Walker, Jason H. Anderson

Abstract: Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarse-grained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CAD mapping tools that use exact methods, such as Integer Linear Programming (ILP). However, tools that target general architectures must use large cons… ▽ More Coarse-grained reconfigurable architectures (CGRAs) are programmable logic devices with large coarse-grained ALU-like logic blocks, and multi-bit datapath-style routing. CGRAs often have relatively restricted data routing networks, so they attract CAD mapping tools that use exact methods, such as Integer Linear Programming (ILP). However, tools that target general architectures must use large constraint systems to fully describe an architecture's flexibility, resulting in lengthy run-times. In this paper, we propose to derive connectivity information from an otherwise generic device model, and use this to create simpler ILPs, which we combine in an iterative schedule and retain most of the exactness of a fully-generic ILP approach. This new approach has a speed-up geometric mean of 5.88x when considering benchmarks that do not hit a time-limit of 7.5 hours on the fully-generic ILP, and 37.6x otherwise. This was measured using the set of benchmarks used to originally evaluate the fully-generic approach and several more benchmarks representing computation tasks, over three different CGRA architectures. All run-times of the new approach are less than 20 minutes, with 90th percentile time of 410 seconds. The proposed mapping techniques are integrated into, and evaluated using the open-source CGRA-ME architecture modelling and exploration framework. △ Less

Submitted 30 April, 2019; v1 submitted 30 January, 2019; originally announced January 2019.

Comments: 8 pages of content; 8 figures; 3 tables; to appear in FCCM 2019; Uses the CGRA-ME framework at http://cgra-me.ece.utoronto.ca/

arXiv:1901.05599 [pdf, other]

doi 10.1109/IROS.2018.8593883

Virtual-to-Real-World Transfer Learning for Robots on Wilderness Trails

Authors: Michael L. Iuzzolino, Michael E. Walker, Daniel Szafir

Abstract: Robots hold promise in many scenarios involving outdoor use, such as search-and-rescue, wildlife management, and collecting data to improve environment, climate, and weather forecasting. However, autonomous navigation of outdoor trails remains a challenging problem. Recent work has sought to address this issue using deep learning. Although this approach has achieved state-of-the-art results, the d… ▽ More Robots hold promise in many scenarios involving outdoor use, such as search-and-rescue, wildlife management, and collecting data to improve environment, climate, and weather forecasting. However, autonomous navigation of outdoor trails remains a challenging problem. Recent work has sought to address this issue using deep learning. Although this approach has achieved state-of-the-art results, the deep learning paradigm may be limited due to a reliance on large amounts of annotated training data. Collecting and curating training datasets may not be feasible or practical in many situations, especially as trail conditions may change due to seasonal weather variations, storms, and natural erosion. In this paper, we explore an approach to address this issue through virtual-to-real-world transfer learning using a variety of deep learning models trained to classify the direction of a trail in an image. Our approach utilizes synthetic data gathered from virtual environments for model training, bypassing the need to collect a large amount of real images of the outdoors. We validate our approach in three main ways. First, we demonstrate that our models achieve classification accuracies upwards of 95% on our synthetic data set. Next, we utilize our classification models in the control system of a simulated robot to demonstrate feasibility. Finally, we evaluate our models on real-world trail data and demonstrate the potential of virtual-to-real-world transfer learning. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: iROS 2018

Journal ref: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 576-582)

arXiv:1809.05288 [pdf, ps, other]

Characterizing Variation in Crowd-Sourced Data for Training Neural Language Generators to Produce Stylistically Varied Outputs

Authors: Juraj Juraska, Marilyn Walker

Abstract: One of the biggest challenges of end-to-end language generation from meaning representations in dialogue systems is making the outputs more natural and varied. Here we take a large corpus of 50K crowd-sourced utterances in the restaurant domain and develop text analysis methods that systematically characterize types of sentences in the training data. We then automatically label the training data t… ▽ More One of the biggest challenges of end-to-end language generation from meaning representations in dialogue systems is making the outputs more natural and varied. Here we take a large corpus of 50K crowd-sourced utterances in the restaurant domain and develop text analysis methods that systematically characterize types of sentences in the training data. We then automatically label the training data to allow us to conduct two kinds of experiments with a neural generator. First, we test the effect of training the system with different stylistic partitions and quantify the effect of smaller, but more stylistically controlled training data. Second, we propose a method of labeling the style variants during training, and show that we can modify the style of the generated utterances using our stylistic labels. We contrast and compare these methods that can be used with any existing large corpus, showing how they vary in terms of semantic quality and stylistic control. △ Less

Submitted 14 September, 2018; originally announced September 2018.

Comments: Accepted to INLG 2018

arXiv:1809.03015 [pdf, other]

Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?

Authors: Lena Reed, Shereen Oraby, Marilyn Walker

Abstract: Responses in task-oriented dialogue systems often realize multiple propositions whose ultimate form depends on the use of sentence planning and discourse structuring operations. For example a recommendation may consist of an explicitly evaluative utterance e.g. Chanpen Thai is the best option, along with content related by the justification discourse relation, e.g. It has great food and service, t… ▽ More Responses in task-oriented dialogue systems often realize multiple propositions whose ultimate form depends on the use of sentence planning and discourse structuring operations. For example a recommendation may consist of an explicitly evaluative utterance e.g. Chanpen Thai is the best option, along with content related by the justification discourse relation, e.g. It has great food and service, that combines multiple propositions into a single phrase. While neural generation methods integrate sentence planning and surface realization in one end-to-end learning framework, previous work has not shown that neural generators can: (1) perform common sentence planning and discourse structuring operations; (2) make decisions as to whether to realize content in a single sentence or over multiple sentences; (3) generalize sentence planning and discourse relation operations beyond what was seen in training. We systematically create large training corpora that exhibit particular sentence planning operations and then test neural models to see what they learn. We compare models without explicit latent variables for sentence planning with ones that provide explicit supervision during training. We show that only the models with additional supervision can reproduce sentence planing and discourse operations and generalize to situations unseen in training. △ Less

Submitted 1 November, 2018; v1 submitted 9 September, 2018; originally announced September 2018.

Comments: 12 pages, 12 tables, 3 figures, iNLG 2018

arXiv:1809.02637 [pdf, other]

Neural Generation of Diverse Questions using Answer Focus, Contextual and Linguistic Features

Authors: Vrindavan Harrison, Marilyn Walker

Abstract: Question Generation is the task of automatically creating questions from textual input. In this work we present a new Attentional Encoder--Decoder Recurrent Neural Network model for automatic question generation. Our model incorporates linguistic features and an additional sentence embedding to capture meaning at both sentence and word levels. The linguistic features are designed to capture inform… ▽ More Question Generation is the task of automatically creating questions from textual input. In this work we present a new Attentional Encoder--Decoder Recurrent Neural Network model for automatic question generation. Our model incorporates linguistic features and an additional sentence embedding to capture meaning at both sentence and word levels. The linguistic features are designed to capture information related to named entity recognition, word case, and entity coreference resolution. In addition our model uses a copying mechanism and a special answer signal that enables generation of numerous diverse questions on a given sentence. Our model achieves state of the art results of 19.98 Bleu_4 on a benchmark Question Generation dataset, outperforming all previously published results by a significant margin. A human evaluation also shows that these added features improve the quality of the generated questions. △ Less

Submitted 5 October, 2018; v1 submitted 7 September, 2018; originally announced September 2018.

Comments: Accepted to appear at INLG 2018

arXiv:1809.01331 [pdf, other]

Neural MultiVoice Models for Expressing Novel Personalities in Dialog

Authors: Shereen Oraby, Lena Reed, Sharath TS, Shubhangi Tandon, Marilyn Walker

Abstract: Natural language generators for task-oriented dialog should be able to vary the style of the output utterance while still effectively realizing the system dialog actions and their associated semantics. While the use of neural generation for training the response generation component of conversational agents promises to simplify the process of producing high quality responses in new domains, to our… ▽ More Natural language generators for task-oriented dialog should be able to vary the style of the output utterance while still effectively realizing the system dialog actions and their associated semantics. While the use of neural generation for training the response generation component of conversational agents promises to simplify the process of producing high quality responses in new domains, to our knowledge, there has been very little investigation of neural generators for task-oriented dialog that can vary their response style, and we know of no experiments on models that can generate responses that are different in style from those seen during training, while still maintain- ing semantic fidelity to the input meaning representation. Here, we show that a model that is trained to achieve a single stylis- tic personality target can produce outputs that combine stylistic targets. We carefully evaluate the multivoice outputs for both semantic fidelity and for similarities to and differences from the linguistic features that characterize the original training style. We show that contrary to our predictions, the learned models do not always simply interpolate model parameters, but rather produce styles that are distinct, and novel from the personalities they were trained on. △ Less

Submitted 5 September, 2018; originally announced September 2018.

Comments: Interspeech 2018

arXiv:1806.03889 [pdf, other]

doi 10.3389/fphys.2018.01767

Fractal and Multifractal Properties of Electrographic Recordings of Human Brain Activity: Toward Its Use as a Signal Feature for Machine Learning in Clinical Applications

Authors: Lucas G. S. França, José G. V. Miranda, Marco Leite, Niraj K. Sharma, Matthew C. Walker, Louis Lemieux, Yujiang Wang

Abstract: The brain is a system operating on multiple time scales, and characterisation of dynamics across time scales remains a challenge. One framework to study such dynamics is that of fractal geometry. However, currently there exists no established method for the study of brain dynamics using fractal geometry, due to the many challenges in the conceptual and technical understanding of the methods. We ai… ▽ More The brain is a system operating on multiple time scales, and characterisation of dynamics across time scales remains a challenge. One framework to study such dynamics is that of fractal geometry. However, currently there exists no established method for the study of brain dynamics using fractal geometry, due to the many challenges in the conceptual and technical understanding of the methods. We aim to highlight some of the practical challenges of applying fractal geometry to brain dynamics and propose solutions to enable its wider use in neuroscience. Using intracranially recorded EEG and simulated data, we compared monofractal and multifractal methods with regards to their sensitivity to signal variance. We found that both correlate closely with signal variance, thus not offering new information about the signal. However, after applying an epoch-wise standardisation procedure to the signal, we found that multifractal measures could offer non-redundant information compared to signal variance, power and other established EEG signal measures. We also compared different multifractal estimation methods and found that the Chhabra-Jensen algorithm performed best. Finally, we investigated the impact of sampling frequency and epoch length on multifractal properties. Using epileptic seizures as an example event in the EEG, we show that there may be an optimal time scale for detecting temporal changes in multifractal properties around seizures. The practical issues we highlighted and our suggested solutions should help in developing a robust method for the application of fractal geometry in EEG signals. Our analyses and observations also aid the theoretical understanding of the multifractal properties of the brain and might provide grounds for new discoveries in the study of brain signals. These could be crucial for understanding of neurological function and for the developments of new treatments. △ Less

Submitted 11 December, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: Final version published at Frontiers in Physiology. https://doi.org/10.3389/fphys.2018.01767

Journal ref: França, LGS et al., (2018) Fractal and Multifractal Properties of Electrographic Recordings of Human Brain Activity: Toward Its Use as a Signal Feature for Machine Learning in Clinical Applications. Front. Physiol. 9:1767

arXiv:1805.08352 [pdf, other]

Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators

Authors: Shereen Oraby, Lena Reed, Shubhangi Tandon, T. S. Sharath, Stephanie Lukin, Marilyn Walker

Abstract: Natural language generators for task-oriented dialogue must effectively realize system dialogue actions and their associated semantics. In many applications, it is also desirable for generators to control the style of an utterance. To date, work on task-oriented neural generation has primarily focused on semantic fidelity rather than achieving stylistic goals, while work on style has been done in… ▽ More Natural language generators for task-oriented dialogue must effectively realize system dialogue actions and their associated semantics. In many applications, it is also desirable for generators to control the style of an utterance. To date, work on task-oriented neural generation has primarily focused on semantic fidelity rather than achieving stylistic goals, while work on style has been done in contexts where it is difficult to measure content preservation. Here we present three different sequence-to-sequence models and carefully test how well they disentangle content and style. We use a statistical generator, Personage, to synthesize a new corpus of over 88,000 restaurant domain utterances whose style varies according to models of personality, giving us total control over both the semantic content and the stylistic variation in the training data. We then vary the amount of explicit stylistic supervision given to the three models. We show that our most explicit model can simultaneously achieve high fidelity to both semantic and stylistic goals: this model adds a context vector of 36 stylistic parameters as input to the hidden state of the encoder at each time step, showing the benefits of explicit stylistic supervision, even when the amount of training data is large. △ Less

Submitted 21 May, 2018; originally announced May 2018.

Comments: To appear at SIGDIAL 2018

arXiv:1805.06553 [pdf, other]

A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation

Authors: Juraj Juraska, Panagiotis Karagiannis, Kevin K. Bowden, Marilyn A. Walker

Abstract: Natural language generation lies at the core of generative dialogue systems and conversational agents. We describe an ensemble neural language generator, and present several novel methods for data representation and augmentation that yield improved results in our model. We test the model on three datasets in the restaurant, TV and laptop domains, and report both objective and subjective evaluation… ▽ More Natural language generation lies at the core of generative dialogue systems and conversational agents. We describe an ensemble neural language generator, and present several novel methods for data representation and augmentation that yield improved results in our model. We test the model on three datasets in the restaurant, TV and laptop domains, and report both objective and subjective evaluations of our best model. Using a range of automatic metrics, as well as human evaluators, we show that our approach achieves better results than state-of-the-art models on the same datasets. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Comments: Accepted to NAACL 2018

Showing 1–50 of 108 results for author: Walker, M