-
Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback
Authors:
Kaustubh D. Dhole,
Ramraj Chandradevan,
Eugene Agichtein
Abstract:
Query Reformulation (QR) is a set of techniques used to transform a user's original search query to a text that better aligns with the user's intent and improves their search experience. Recently, zero-shot QR has been a promising approach due to its ability to exploit knowledge inherent in large language models. Inspired by the success of ensemble prompting strategies which have benefited other t…
▽ More
Query Reformulation (QR) is a set of techniques used to transform a user's original search query to a text that better aligns with the user's intent and improves their search experience. Recently, zero-shot QR has been a promising approach due to its ability to exploit knowledge inherent in large language models. Inspired by the success of ensemble prompting strategies which have benefited other tasks, we investigate if they can improve query reformulation. In this context, we propose two ensemble-based prompting techniques, GenQREnsemble and GenQRFusion which leverage paraphrases of a zero-shot instruction to generate multiple sets of keywords to improve retrieval performance ultimately. We further introduce their post-retrieval variants to incorporate relevance feedback from a variety of sources, including an oracle simulating a human user and a "critic" LLM. We demonstrate that an ensemble of query reformulations can improve retrieval effectiveness by up to 18% on nDCG@10 in pre-retrieval settings and 9% on post-retrieval settings on multiple benchmarks, outperforming all previously reported SOTA results. We perform subsequent analyses to investigate the effects of feedback documents, incorporate domain-specific instructions, filter reformulations, and generate fluent reformulations that might be more beneficial to human searchers. Together, the techniques and the results presented in this paper establish a new state of the art in automated query reformulation for retrieval and suggest promising directions for future research.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation
Authors:
Ramraj Chandradevan,
Kaustubh D. Dhole,
Eugene Agichtein
Abstract:
State-of-the-art neural rankers pre-trained on large task-specific training data such as MS-MARCO, have been shown to exhibit strong performance on various ranking tasks without domain adaptation, also called zero-shot. However, zero-shot neural ranking may be sub-optimal, as it does not take advantage of the target domain information. Unfortunately, acquiring sufficiently large and high quality t…
▽ More
State-of-the-art neural rankers pre-trained on large task-specific training data such as MS-MARCO, have been shown to exhibit strong performance on various ranking tasks without domain adaptation, also called zero-shot. However, zero-shot neural ranking may be sub-optimal, as it does not take advantage of the target domain information. Unfortunately, acquiring sufficiently large and high quality target training data to improve a modern neural ranker can be costly and time-consuming. To address this problem, we propose a new approach to unsupervised domain adaptation for ranking, DUQGen, which addresses a critical gap in prior literature, namely how to automatically generate both effective and diverse synthetic training data to fine tune a modern neural ranker for a new domain. Specifically, DUQGen produces a more effective representation of the target domain by identifying clusters of similar documents; and generates a more diverse training dataset by probabilistic sampling over the resulting document clusters. Our extensive experiments, over the standard BEIR collection, demonstrate that DUQGen consistently outperforms all zero-shot baselines and substantially outperforms the SOTA baselines on 16 out of 18 datasets, for an average of 4% relative improvement across all datasets. We complement our results with a thorough analysis for more in-depth understanding of the proposed method's performance and to identify promising areas for further improvements.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
QueryExplorer: An Interactive Query Generation Assistant for Search and Exploration
Authors:
Kaustubh D. Dhole,
Shivam Bajaj,
Ramraj Chandradevan,
Eugene Agichtein
Abstract:
Formulating effective search queries remains a challenging task, particularly when users lack expertise in a specific domain or are not proficient in the language of the content. Providing example documents of interest might be easier for a user. However, such query-by-example scenarios are prone to concept drift, and the retrieval effectiveness is highly sensitive to the query generation method,…
▽ More
Formulating effective search queries remains a challenging task, particularly when users lack expertise in a specific domain or are not proficient in the language of the content. Providing example documents of interest might be easier for a user. However, such query-by-example scenarios are prone to concept drift, and the retrieval effectiveness is highly sensitive to the query generation method, without a clear way to incorporate user feedback. To enable exploration and to support Human-In-The-Loop experiments we propose QueryExplorer -- an interactive query generation, reformulation, and retrieval interface with support for HuggingFace generation models and PyTerrier's retrieval pipelines and datasets, and extensive logging of human feedback. To allow users to create and modify effective queries, our demo supports complementary approaches of using LLMs interactively, assisting the user with edits and feedback at multiple stages of the query formulation process. With support for recording fine-grained interactions and user annotations, QueryExplorer can serve as a valuable experimental and research platform for annotation, qualitative evaluation, and conducting Human-in-the-Loop (HITL) experiments for complex search tasks where users struggle to formulate queries.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
KAUCUS: Knowledge Augmented User Simulators for Training Language Model Assistants
Authors:
Kaustubh D. Dhole
Abstract:
An effective multi-turn instruction-following assistant can be developed by creating a simulator that can generate useful interaction data. Apart from relying on its intrinsic weights, an ideal user simulator should also be able to bootstrap external knowledge rapidly in its raw form to simulate the multifarious diversity of text available over the internet. Previous user simulators generally lack…
▽ More
An effective multi-turn instruction-following assistant can be developed by creating a simulator that can generate useful interaction data. Apart from relying on its intrinsic weights, an ideal user simulator should also be able to bootstrap external knowledge rapidly in its raw form to simulate the multifarious diversity of text available over the internet. Previous user simulators generally lacked diversity, were mostly closed domain, and necessitated rigid schema making them inefficient to rapidly scale to incorporate external knowledge. In this regard, we introduce, Kaucus, a Knowledge-Augmented User Simulator framework, to outline a process of creating diverse user simulators, that can seamlessly exploit external knowledge as well as benefit downstream assistant model training. Through two GPT-J based simulators viz., a Retrieval Augmented Simulator and a Summary Controlled Simulator we generate diverse simulator-assistant interactions. Through reward and preference model-based evaluations, we find that these interactions serve as useful training data and create more helpful downstream assistants. We also find that incorporating knowledge through retrieval augmentation or summary control helps create better assistants.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback
Authors:
Kaustubh D. Dhole,
Ramraj Chandradevan,
Eugene Agichtein
Abstract:
While search is the predominant method of accessing information, formulating effective queries remains a challenging task, especially for situations where the users are not familiar with a domain, or searching for documents in other languages, or looking for complex information such as events, which are not easily expressible as queries. Providing example documents or passages of interest, might b…
▽ More
While search is the predominant method of accessing information, formulating effective queries remains a challenging task, especially for situations where the users are not familiar with a domain, or searching for documents in other languages, or looking for complex information such as events, which are not easily expressible as queries. Providing example documents or passages of interest, might be easier for a user, however, such query-by-example scenarios are prone to concept drift, and are highly sensitive to the query generation method. This demo illustrates complementary approaches of using LLMs interactively, assisting and enabling the user to provide edits and feedback at all stages of the query formulation process. The proposed Query Generation Assistant is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection. Specifically, the proposed assistive interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries. The proposed interface is a valuable experimental tool for exploring fine-tuning and prompting of LLMs for query generation to qualitatively evaluate the effectiveness of retrieval and ranking models, and for conducting Human-in-the-Loop (HITL) experiments for complex search tasks where users struggle to formulate queries without such assistance.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Authors:
Samuel Cahyawijaya,
Holy Lovenia,
Alham Fikri Aji,
Genta Indra Winata,
Bryan Wilie,
Rahmad Mahendra,
Christian Wibisono,
Ade Romadhony,
Karissa Vincentio,
Fajri Koto,
Jennifer Santoso,
David Moeljadi,
Cahya Wirawan,
Frederikus Hudi,
Ivan Halim Parmonangan,
Ika Alfina,
Muhammad Satrio Wicaksono,
Ilham Firdausi Putra,
Samsul Rahmadani,
Yulianti Oenang,
Ali Akbar Septiandri,
James Jaya,
Kaustubh D. Dhole,
Arie Ardiyanti Suryani,
Rifki Afina Putri
, et al. (22 additional authors not shown)
Abstract:
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple exp…
▽ More
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
△ Less
Submitted 21 July, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Lessons from Digital India for the Right to Internet Access
Authors:
Kaustubh D. Dhole
Abstract:
With only 65% of Indian houses having access to the Internet, digital India faces a significant Internet divide across gender and city types. Rendering essential services inaccessible to almost a third of the population necessitates not only provisioning a fundamental right to Internet access but taking specific constructive steps to assure its simple, affordable and safe accessibility. Establishi…
▽ More
With only 65% of Indian houses having access to the Internet, digital India faces a significant Internet divide across gender and city types. Rendering essential services inaccessible to almost a third of the population necessitates not only provisioning a fundamental right to Internet access but taking specific constructive steps to assure its simple, affordable and safe accessibility. Establishing such a right would also pave way for other ancillary rights required for data privacy, protection from Internet's possible harms and the requirement to be treated fairly. We first discuss two arguments on the universal right to Internet access; from Merten Reglitz, a senior lecturer on Global Ethics and from Vincent Cerf, one of the founding creators of the Internet who has had a profound influence on the field. We specifically argue why Internet access should be treated as a fundamental right. We discuss the learnings from India, contextualizing them with the global debate and argue for establishing Internet access as a fundamental right in India and elsewhere in the form of government legislation to eliminate Internet divide.
△ Less
Submitted 12 November, 2022;
originally announced November 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
A Bird's-Eye Tutorial of Graph Attention Architectures
Authors:
Kaustubh D. Dhole,
Carl Yang
Abstract:
Graph Neural Networks (GNNs) have shown tremendous strides in performance for graph-structured problems especially in the domains of natural language processing, computer vision and recommender systems. Inspired by the success of the transformer architecture, there has been an ever-growing body of work on attention variants of GNNs attempting to advance the state of the art in many of these proble…
▽ More
Graph Neural Networks (GNNs) have shown tremendous strides in performance for graph-structured problems especially in the domains of natural language processing, computer vision and recommender systems. Inspired by the success of the transformer architecture, there has been an ever-growing body of work on attention variants of GNNs attempting to advance the state of the art in many of these problems. Incorporating "attention" into graph mining has been viewed as a way to overcome the noisiness, heterogenity and complexity associated with graph-structured data as well as to encode soft-inductive bias. It is hence crucial and advantageous to study these variants from a bird's-eye view to assess their strengths and weaknesses. We provide a systematic and focused tutorial centered around attention based GNNs in a hope to benefit researchers dealing with graph-structured problems. Our tutorial looks at GNN variants from the point of view of the attention function and iteratively builds the reader's understanding of different graph attention variants.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Authors:
Kaustubh D. Dhole,
Varun Gangal,
Sebastian Gehrmann,
Aadesh Gupta,
Zhenhao Li,
Saad Mahamood,
Abinaya Mahendiran,
Simon Mille,
Ashish Shrivastava,
Samson Tan,
Tongshuang Wu,
Jascha Sohl-Dickstein,
Jinho D. Choi,
Eduard Hovy,
Ondrej Dusek,
Sebastian Ruder,
Sajant Anand,
Nagender Aneja,
Rabin Banjade,
Lisa Barthe,
Hanna Behnke,
Ian Berlot-Attwell,
Connor Boyle,
Caroline Brun,
Marco Antonio Sobrevilla Cabezudo
, et al. (101 additional authors not shown)
Abstract:
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split…
▽ More
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter).
△ Less
Submitted 11 October, 2022; v1 submitted 5 December, 2021;
originally announced December 2021.
-
CANDLE: Decomposing Conditional and Conjunctive Queries for Task-Oriented Dialogue Systems
Authors:
Aadesh Gupta,
Kaustubh D. Dhole,
Rahul Tarway,
Swetha Prabhakar,
Ashish Shrivastava
Abstract:
Domain-specific dialogue systems generally determine user intents by relying on sentence level classifiers that mainly focus on single action sentences. Such classifiers are not designed to effectively handle complex queries composed of conditional and sequential clauses that represent multiple actions. We attempt to decompose such queries into smaller single action subqueries that are reasonable…
▽ More
Domain-specific dialogue systems generally determine user intents by relying on sentence level classifiers that mainly focus on single action sentences. Such classifiers are not designed to effectively handle complex queries composed of conditional and sequential clauses that represent multiple actions. We attempt to decompose such queries into smaller single action subqueries that are reasonable for intent classifiers to understand in a dialogue pipeline. We release, CANDLE(Conditional & AND type Expressions), a dataset consisting of 4282 utterances manually tagged with conditional and sequential labels, and demonstrates this decomposition by training two baseline taggers.
△ Less
Submitted 23 November, 2022; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Authors:
Simon Mille,
Kaustubh D. Dhole,
Saad Mahamood,
Laura Perez-Beltrachini,
Varun Gangal,
Mihir Kale,
Emiel van Miltenburg,
Sebastian Gehrmann
Abstract:
Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies the complexity of language and encourages overfitting to the head of the data distribution. As such, rare language phenomena or text about underrepres…
▽ More
Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies the complexity of language and encourages overfitting to the head of the data distribution. As such, rare language phenomena or text about underrepresented groups are not equally included in the evaluation. To encourage more in-depth model analyses, researchers have proposed the use of multiple test sets, also called challenge sets, that assess specific capabilities of a model. In this paper, we develop a framework based on this idea which is able to generate controlled perturbations and identify subsets in text-to-scalar, text-to-text, or data-to-text settings. By applying this framework to the GEM generation benchmark, we propose an evaluation suite made of 80 challenge sets, demonstrate the kinds of analyses that it enables and shed light onto the limits of current generation models.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Authors:
Sebastian Gehrmann,
Tosin Adewumi,
Karmanya Aggarwal,
Pawan Sasanka Ammanamanchi,
Aremu Anuoluwapo,
Antoine Bosselut,
Khyathi Raghavi Chandu,
Miruna Clinciu,
Dipanjan Das,
Kaustubh D. Dhole,
Wanyu Du,
Esin Durmus,
Ondřej Dušek,
Chris Emezue,
Varun Gangal,
Cristina Garbacea,
Tatsunori Hashimoto,
Yufang Hou,
Yacine Jernite,
Harsh Jhamtani,
Yangfeng Ji,
Shailza Jolly,
Mihir Kale,
Dhruv Kumar,
Faisal Ladhak
, et al. (31 additional authors not shown)
Abstract:
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it…
▽ More
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for which we are organizing a shared task at our ACL 2021 Workshop and to which we invite the entire NLG community to participate.
△ Less
Submitted 1 April, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
Resolving Intent Ambiguities by Retrieving Discriminative Clarifying Questions
Authors:
Kaustubh D. Dhole
Abstract:
Task oriented Dialogue Systems generally employ intent detection systems in order to map user queries to a set of pre-defined intents. However, user queries appearing in natural language can be easily ambiguous and hence such a direct mapping might not be straightforward harming intent detection and eventually the overall performance of a dialogue system. Moreover, acquiring domain-specific clarif…
▽ More
Task oriented Dialogue Systems generally employ intent detection systems in order to map user queries to a set of pre-defined intents. However, user queries appearing in natural language can be easily ambiguous and hence such a direct mapping might not be straightforward harming intent detection and eventually the overall performance of a dialogue system. Moreover, acquiring domain-specific clarification questions is costly. In order to disambiguate queries which are ambiguous between two intents, we propose a novel method of generating discriminative questions using a simple rule based system which can take advantage of any question generation system without requiring annotated data of clarification questions. Our approach aims at discrimination between two intents but can be easily extended to clarification over multiple intents. Seeking clarification from the user to classify user intents not only helps understand the user intent effectively, but also reduces the roboticity of the conversation and makes the interaction considerably natural.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Benchmarking BioRelEx for Entity Tagging and Relation Extraction
Authors:
Abhinav Bhatt,
Kaustubh D. Dhole
Abstract:
Extracting relationships and interactions between different biological entities is still an extremely challenging problem but has not received much attention as much as extraction in other generic domains. In addition to the lack of annotated data, low benchmarking is still a major reason for slow progress. In order to fill this gap, we compare multiple existing entity and relation extraction mode…
▽ More
Extracting relationships and interactions between different biological entities is still an extremely challenging problem but has not received much attention as much as extraction in other generic domains. In addition to the lack of annotated data, low benchmarking is still a major reason for slow progress. In order to fill this gap, we compare multiple existing entity and relation extraction models over a recently introduced public dataset, BioRelEx of sentences annotated with biological entities and relations. Our straightforward benchmarking shows that span-based multi-task architectures like DYGIE show 4.9% and 6% absolute improvements in entity tagging and relation extraction respectively over the previous state-of-art and that incorporating domain-specific information like embeddings pre-trained over related domains boosts performance.
△ Less
Submitted 31 May, 2020;
originally announced June 2020.
-
Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation
Authors:
Kaustubh D. Dhole,
Christopher D. Manning
Abstract:
Question Generation (QG) is fundamentally a simple syntactic transformation; however, many aspects of semantics influence what questions are good to form. We implement this observation by developing SynQG, a set of transparent syntactic rules leveraging universal dependencies, shallow semantic parsing, lexical resources, and custom rules which transform declarative sentences into question-answer p…
▽ More
Question Generation (QG) is fundamentally a simple syntactic transformation; however, many aspects of semantics influence what questions are good to form. We implement this observation by developing SynQG, a set of transparent syntactic rules leveraging universal dependencies, shallow semantic parsing, lexical resources, and custom rules which transform declarative sentences into question-answer pairs. We utilize PropBank argument descriptions and VerbNet state predicates to incorporate shallow semantic content, which helps generate questions of a descriptive nature and produce inferential and semantically richer questions than existing systems. In order to improve syntactic fluency and eliminate grammatically incorrect questions, we employ back-translation over the output of these syntactic rules. A set of crowd-sourced evaluations shows that our system can generate a larger number of highly grammatical and relevant questions than previous QG systems and that back-translation drastically improves grammaticality at a slight cost of generating irrelevant questions.
△ Less
Submitted 28 November, 2022; v1 submitted 18 April, 2020;
originally announced April 2020.