Skip to main content

Showing 1–33 of 33 results for author: Novikova, J

  1. arXiv:2404.01981  [pdf, ps, other

    cs.LG cs.SD eess.AS

    Zero-Shot Multi-Lingual Speaker Verification in Clinical Trials

    Authors: Ali Akram, Marija Stanojevic, Malikeh Ehghaghi, Jekaterina Novikova

    Abstract: Due to the substantial number of clinicians, patients, and data collection environments involved in clinical trials, gathering data of superior quality poses a significant challenge. In clinical trials, patients are assessed based on their speech data to detect and monitor cognitive and mental health disorders. We propose using these speech recordings to verify the identities of enrolled patients… ▽ More

    Submitted 5 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  2. arXiv:2306.12444  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    Factors Affecting the Performance of Automated Speaker Verification in Alzheimer's Disease Clinical Trials

    Authors: Malikeh Ehghaghi, Marija Stanojevic, Ali Akram, Jekaterina Novikova

    Abstract: Detecting duplicate patient participation in clinical trials is a major challenge because repeated patients can undermine the credibility and accuracy of the trial's findings and result in significant health and financial risks. Developing accurate automated speaker verification (ASV) models is crucial to verify the identity of enrolled individuals and remove duplicates, but the size and quality o… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to the 5th Clinical Natural Language Processing Workshop (ClinicalNLP) at ACL 2023

  3. arXiv:2306.12443  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    DEPAC: a Corpus for Depression and Anxiety Detection from Speech

    Authors: Mashrura Tasnim, Malikeh Ehghaghi, Brian Diep, Jekaterina Novikova

    Abstract: Mental distress like depression and anxiety contribute to the largest proportion of the global burden of diseases. Automated diagnosis systems of such disorders, empowered by recent innovations in Artificial Intelligence, can pave the way to reduce the sufferings of the affected individuals. Development of such systems requires information-rich and balanced corpora. In this work, we introduce a no… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to the Eighth Workshop on Computational Linguistics and Clinical Psychology (CLPsych) at NAACL 2022

  4. arXiv:2302.09214  [pdf, other

    cs.SD eess.AS stat.ML

    Cost-effective Models for Detecting Depression from Speech

    Authors: Mashrura Tasnim, Jekaterina Novikova

    Abstract: Depression is the most common psychological disorder and is considered as a leading cause of disability and suicide worldwide. An automated system capable of detecting signs of depression in human speech can contribute to ensuring timely and effective mental health care for individuals suffering from the disorder. Developing such automated system requires accurate machine learning models, capable… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted to ICMLA 2022

  5. arXiv:2212.14490  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    Multi-modal deep learning system for depression and anxiety detection

    Authors: Brian Diep, Marija Stanojevic, Jekaterina Novikova

    Abstract: Traditional screening practices for anxiety and depression pose an impediment to monitoring and treating these conditions effectively. However, recent advances in NLP and speech modelling allow textual, acoustic, and hand-crafted language-based features to jointly form the basis of future mental health screening and condition detection. Speech is a rich and readily available source of insight into… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: accepted to the PAI4MH workshop at NeurIPS 2022

  6. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  7. arXiv:2210.03303  [pdf, other

    cs.CL cs.LG

    Data-driven Approach to Differentiating between Depression and Dementia from Noisy Speech and Language Data

    Authors: Malikeh Ehghaghi, Frank Rudzicz, Jekaterina Novikova

    Abstract: A significant number of studies apply acoustic and linguistic characteristics of human speech as prominent markers of dementia and depression. However, studies on discriminating depression from dementia are rare. Co-morbid depression is frequent in dementia and these clinical conditions share many overlapping symptoms, but the ability to distinguish between depression and dementia is essential as… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: W-NUT at COLING 2022

  8. arXiv:2209.05286  [pdf, other

    cs.CL

    DECK: Behavioral Tests to Improve Interpretability and Generalizability of BERT Models Detecting Depression from Text

    Authors: Jekaterina Novikova, Ksenia Shkaruta

    Abstract: Models that accurately detect depression from text are important tools for addressing the post-pandemic mental health crisis. BERT-based classifiers' promising performance and the off-the-shelf availability make them great candidates for this task. However, these models are known to suffer from performance inconsistencies and poor generalization. In this paper, we introduce the DECK (DEpression Ch… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

  9. arXiv:2206.11249  [pdf, other

    cs.CL cs.AI cs.LG

    GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

    Authors: Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter , et al. (52 additional authors not shown)

    Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, an… ▽ More

    Submitted 24 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  10. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  11. arXiv:2203.17110  [pdf, other

    cs.SD cs.CL eess.AS

    Impact of Environmental Noise on Alzheimer's Disease Detection from Speech: Should You Let a Baby Cry?

    Authors: Jekaterina Novikova

    Abstract: Research related to automatically detecting Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional methods. Since AD significantly affects the acoustics of spontaneous speech, speech processing and machine learning (ML) provide promising techniques for reliably detecting AD. However, speech audio may be affected by different types of background nois… ▽ More

    Submitted 14 September, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: W-NUT at COLING 2022

  12. arXiv:2109.11888  [pdf, other

    cs.CL

    Robustness and Sensitivity of BERT Models Predicting Alzheimer's Disease from Text

    Authors: Jekaterina Novikova

    Abstract: Understanding robustness and sensitivity of BERT models predicting Alzheimer's disease from text is important for both developing better classification models and for understanding their capabilities and limitations. In this paper, we analyze how a controlled amount of desired and undesired text alterations impacts performance of BERT. We show that BERT is robust to natural linguistic variations i… ▽ More

    Submitted 25 October, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: Accepted to W-NUT @ EMNLP 2021 (upd: correction in Table 3)

  13. arXiv:2106.01555  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Comparing Acoustic-based Approaches for Alzheimer's Disease Detection

    Authors: Aparna Balagopalan, Jekaterina Novikova

    Abstract: Robust strategies for Alzheimer's disease (AD) detection are important, given the high prevalence of AD. In this paper, we study the performance and generalizability of three approaches for AD detection from speech on the recent ADReSSo challenge dataset: 1) using conventional acoustic features 2) using novel pre-trained acoustic embeddings 3) combining acoustic features and embeddings. We find th… ▽ More

    Submitted 15 September, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted to INTERSPEECH 2021; update includes corrections to last two rows of Table 2 and corresponding text edits

  14. arXiv:2011.06153  [pdf, ps, other

    cs.CL cs.LG

    Augmenting BERT Carefully with Underrepresented Linguistic Features

    Authors: Aparna Balagopalan, Jekaterina Novikova

    Abstract: Fine-tuned Bidirectional Encoder Representations from Transformers (BERT)-based sequence classification models have proven to be effective for detecting Alzheimer's Disease (AD) from transcripts of human speech. However, previous research shows it is possible to improve BERT's performance on various tasks by augmenting the model with additional information. In this work, we use probing tasks as in… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

  15. arXiv:2010.06579  [pdf, other

    cs.LG cs.CL

    Fantastic Features and Where to Find Them: Detecting Cognitive Impairment with a Subsequence Classification Guided Approach

    Authors: Benjamin Eyre, Aparna Balagopalan, Jekaterina Novikova

    Abstract: Despite the widely reported success of embedding-based machine learning methods on natural language processing tasks, the use of more easily interpreted engineered features remains common in fields such as cognitive impairment (CI) detection. Manually engineering features from noisy text is time and resource consuming, and can potentially result in features that do not enhance model performance. T… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: EMNLP Workshop on Noisy User-generated Text (W-NUT 2020)

  16. arXiv:2008.01551  [pdf, other

    cs.CL cs.LG

    To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer's Disease Detection

    Authors: Aparna Balagopalan, Benjamin Eyre, Frank Rudzicz, Jekaterina Novikova

    Abstract: Research related to automatically detecting Alzheimer's disease (AD) is important, given the high prevalence of AD and the high cost of traditional methods. Since AD significantly affects the content and acoustics of spontaneous speech, natural language processing and machine learning provide promising techniques for reliably detecting AD. We compare and contrast the performance of two such approa… ▽ More

    Submitted 26 July, 2020; originally announced August 2020.

    Comments: accepted to INTERSPEECH 2020

  17. arXiv:1912.04370  [pdf, other

    eess.AS cs.CL cs.LG cs.SD stat.ML

    Cross-Language Aphasia Detection using Optimal Transport Domain Adaptation

    Authors: Aparna Balagopalan, Jekaterina Novikova, Matthew B. A. McDermott, Bret Nestor, Tristan Naumann, Marzyeh Ghassemi

    Abstract: Multi-language speech datasets are scarce and often have small sample sizes in the medical domain. Robust transfer of linguistic features across languages could improve rates of early diagnosis and therapy for speakers of low-resource languages when detecting health conditions from speech. We utilize out-of-domain, unpaired, single-speaker, healthy speech data for training multiple Optimal Transpo… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

    Comments: Accepted to ML4H at NeurIPS 2019

  18. arXiv:1910.00065  [pdf, other

    cs.CL

    Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power

    Authors: Jekaterina Novikova, Aparna Balagopalan, Ksenia Shkaruta, Frank Rudzicz

    Abstract: Understanding the vulnerability of linguistic features extracted from noisy text is important for both developing better health text classification models and for interpreting vulnerabilities of natural language models. In this paper, we investigate how generic language characteristics, such as syntax or the lexicon, are impacted by artificial text alterations. The vulnerability of features is ana… ▽ More

    Submitted 30 September, 2019; originally announced October 2019.

    Comments: EMNLP Workshop on Noisy User-generated Text (W-NUT 2019)

  19. arXiv:1906.10064  [pdf, other

    cs.LG cs.AI stat.ML

    Variations on the Chebyshev-Lagrange Activation Function

    Authors: Yuchen Li, Frank Rudzicz, Jekaterina Novikova

    Abstract: We seek to improve the data efficiency of neural networks and present novel implementations of parameterized piece-wise polynomial activation functions. The parameters are the y-coordinates of n+1 Chebyshev nodes per hidden unit and Lagrangian interpolation between the nodes produces the polynomial on [-1, 1]. We show results for different methods of handling inputs outside [-1, 1] on synthetic da… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

  20. arXiv:1904.01684  [pdf, other

    cs.CL

    Impact of ASR on Alzheimer's Disease Detection: All Errors are Equal, but Deletions are More Equal than Others

    Authors: Aparna Balagopalan, Ksenia Shkaruta, Jekaterina Novikova

    Abstract: Automatic Speech Recognition (ASR) is a critical component of any fully-automated speech-based dementia detection model. However, despite years of speech recognition research, little is known about the impact of ASR accuracy on dementia detection. In this paper, we experiment with controlled amounts of artificially generated ASR errors and investigate their influence on dementia detection. We find… ▽ More

    Submitted 13 October, 2020; v1 submitted 2 April, 2019; originally announced April 2019.

    Comments: EMNLP Workshop on Noisy User-generated Text (W-NUT 2020)

  21. Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: This paper provides a comprehensive analysis of the first shared task on End-to-End Natural Language Generation (NLG) and identifies avenues for future research based on the results. This shared task aimed to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, syntactic complexity and diverse discourse phenomen… ▽ More

    Submitted 24 July, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: Computer Speech and Language, final accepted manuscript (in press)

    ACM Class: I.2.7

  22. arXiv:1811.12254  [pdf, other

    cs.LG cs.CL cs.SD eess.AS stat.ML

    The Effect of Heterogeneous Data for Alzheimer's Disease Detection from Speech

    Authors: Aparna Balagopalan, Jekaterina Novikova, Frank Rudzicz, Marzyeh Ghassemi

    Abstract: Speech datasets for identifying Alzheimer's disease (AD) are generally restricted to participants performing a single task, e.g. describing an image shown to them. As a result, models trained on linguistic features derived from such datasets may not be generalizable across tasks. Building on prior work demonstrating that same-task data of healthy participants helps improve AD detection on a single… ▽ More

    Submitted 29 November, 2018; originally announced November 2018.

    Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

    Report number: ML4H/2018/147

  23. arXiv:1810.01170  [pdf, other

    cs.CL

    Findings of the E2E NLG Challenge

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: This paper summarises the experimental setup and results of the first shared task on end-to-end (E2E) natural language generation (NLG) in spoken dialogue systems. Recent end-to-end generation systems are promising since they reduce the need for data annotation. However, they are currently limited to small, delexicalised datasets. The E2E NLG shared task aims to assess whether these novel approach… ▽ More

    Submitted 2 October, 2018; originally announced October 2018.

    Comments: Accepted to INLG 2018

    Journal ref: Proceedings of the 11th International Conference on Natural Language Generation, pages 322-328, Tilburg, The Netherlands, November 2018

  24. arXiv:1808.06570  [pdf, other

    cs.CL cs.AI

    Detecting cognitive impairments by agreeing on interpretations of linguistic features

    Authors: Zining Zhu, Jekaterina Novikova, Frank Rudzicz

    Abstract: Linguistic features have shown promising applications for detecting various cognitive impairments. To improve detection accuracies, increasing the amount of data or the number of linguistic features have been two applicable approaches. However, acquiring additional clinical data can be expensive, and hand-crafting features is burdensome. In this paper, we take a third approach, proposing Consensus… ▽ More

    Submitted 27 March, 2019; v1 submitted 20 August, 2018; originally announced August 2018.

    Comments: NAACL 2019

  25. arXiv:1807.07217  [pdf, other

    cs.LG stat.ML

    Deconfounding age effects with fair representation learning when assessing dementia

    Authors: Zining Zhu, Jekaterina Novikova, Frank Rudzicz

    Abstract: One of the most prevalent symptoms among the elderly population, dementia, can be detected by classifiers trained on linguistic features extracted from narrative transcripts. However, these linguistic features are impacted in a similar but different fashion by the normal aging process. Aging is therefore a confounding factor, whose effects have been hard for machine learning classifiers (especiall… ▽ More

    Submitted 7 September, 2019; v1 submitted 18 July, 2018; originally announced July 2018.

    Comments: 9 pages, 2 figures

  26. arXiv:1805.09366  [pdf, other

    cs.LG cs.MM cs.SD eess.AS eess.SP stat.ML

    Semi-supervised classification by reaching consensus among modalities

    Authors: Zining Zhu, Jekaterina Novikova, Frank Rudzicz

    Abstract: Deep learning has demonstrated abilities to learn complex structures, but they can be restricted by available data. Recently, Consensus Networks (CNs) were proposed to alleviate data sparsity by utilizing features from multiple modalities, but they too have been limited by the size of labeled data. In this paper, we extend CN to Transductive Consensus Networks (TCNs), suitable for semi-supervised… ▽ More

    Submitted 19 November, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: NIPS IRASL Workshop 2018

  27. RankME: Reliable Human Ratings for Natural Language Generation

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: Human evaluation for natural language generation (NLG) often suffers from inconsistent user ratings. While previous research tends to attribute this problem to individual user preferences, we show that the quality of human judgements can also be improved by experimental design. We present a novel rank-based magnitude estimation method (RankME), which combines the use of continuous scales and relat… ▽ More

    Submitted 15 March, 2018; originally announced March 2018.

    Comments: Accepted to NAACL 2018 (The 2018 Conference of the North American Chapter of the Association for Computational Linguistics)

    Journal ref: Proceedings of NAACL-HLT 2018, pages 72-78, New Orleans, Louisiana, June 1-6, 2018

  28. arXiv:1708.01759  [pdf, other

    cs.CL

    Referenceless Quality Estimation for Natural Language Generation

    Authors: Ondřej Dušek, Jekaterina Novikova, Verena Rieser

    Abstract: Traditional automatic evaluation measures for natural language generation (NLG) use costly human-authored references to estimate the quality of a system output. In this paper, we propose a referenceless quality estimation (QE) approach based on recurrent neural networks, which predicts a quality score for a NLG system output by comparing it to the source meaning representation only. Our method out… ▽ More

    Submitted 5 August, 2017; originally announced August 2017.

    Comments: Accepted as a regular paper to 1st Workshop on Learning to Generate Natural Language (LGNL), Sydney, 10 August 2017

    ACM Class: I.2.7

  29. Why We Need New Evaluation Metrics for NLG

    Authors: Jekaterina Novikova, Ondřej Dušek, Amanda Cercas Curry, Verena Rieser

    Abstract: The majority of NLG evaluation relies on automatic metrics, such as BLEU . In this paper, we motivate the need for novel, system- and data-independent automatic evaluation methods: We investigate a wide range of metrics, including state-of-the-art word-based and novel grammar-based ones, and demonstrate that they only weakly reflect human judgements of system outputs as generated by data-driven, e… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    Comments: accepted to EMNLP 2017

    Journal ref: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2231-2242, Copenhagen, Denmark, September 7-11, 2017

  30. arXiv:1706.09433  [pdf, ps, other

    cs.CL

    Data-driven Natural Language Generation: Paving the Road to Success

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, more r… ▽ More

    Submitted 28 June, 2017; originally announced June 2017.

    Comments: WiNLP workshop at ACL 2017

  31. arXiv:1706.09254  [pdf, other

    cs.CL

    The E2E Dataset: New Challenges For End-to-End Generation

    Authors: Jekaterina Novikova, Ondřej Dušek, Verena Rieser

    Abstract: This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from t… ▽ More

    Submitted 6 July, 2017; v1 submitted 28 June, 2017; originally announced June 2017.

    Comments: Accepted as a short paper for SIGDIAL 2017 (final submission including supplementary material)

    ACM Class: I.2.7

    Journal ref: Proceedings of the SIGDIAL 2017 Conference, pages 201-206, Saarbrücken, Germany, 15-17 August 2017

  32. arXiv:1706.02757  [pdf, other

    cs.RO cs.CL cs.HC

    Sympathy Begins with a Smile, Intelligence Begins with a Word: Use of Multimodal Features in Spoken Human-Robot Interaction

    Authors: Jekaterina Novikova, Christian Dondrup, Ioannis Papaioannou, Oliver Lemon

    Abstract: Recognition of social signals, from human facial expressions or prosody of speech, is a popular research topic in human-robot interaction studies. There is also a long line of research in the spoken dialogue community that investigates user satisfaction in relation to dialogue characteristics. However, very little research relates a combination of multimodal social signals and language features de… ▽ More

    Submitted 8 June, 2017; originally announced June 2017.

    Comments: Robo-NLP workshop at ACL 2017. 9 pages, 5 figures, 6 tables

  33. arXiv:1608.00339  [pdf, other

    cs.CL

    Crowd-sourcing NLG Data: Pictures Elicit Better Data

    Authors: Jekaterina Novikova, Oliver Lemon, Verena Rieser

    Abstract: Recent advances in corpus-based Natural Language Generation (NLG) hold the promise of being easily portable across domains, but require costly training data, consisting of meaning representations (MRs) paired with Natural Language (NL) utterances. In this work, we propose a novel framework for crowdsourcing high quality NLG training data, using automatic quality control measures and evaluating dif… ▽ More

    Submitted 1 August, 2016; originally announced August 2016.

    Comments: The 9th International Natural Language Generation conference INLG, 2016. 10 pages, 2 figures, 3 tables