Skip to main content

Showing 1–33 of 33 results for author: Hagen, M

  1. Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR

    Authors: Nandan Thakur, Luiz Bonifacio, Maik Fröbe, Alexander Bondarenko, Ehsan Kamalloo, Martin Potthast, Matthias Hagen, Jimmy Lin

    Abstract: The zero-shot effectiveness of neural retrieval models is often evaluated on the BEIR benchmark -- a combination of different IR evaluation datasets. Interestingly, previous studies found that particularly on the BEIR subset Touché 2020, an argument retrieval task, neural retrieval models are considerably less effective than BM25. Still, so far, no further investigation has been conducted on what… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: SIGIR 2024 (Resource & Reproducibility Track)

  2. arXiv:2405.07920  [pdf, other

    cs.IR

    A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking

    Authors: Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen

    Abstract: Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, the distilled models usually do not reach their teacher LLM's effectiveness. To investigate whether best practices for fine-tuning cross-encoders on manually labeled data (e.g., hard-negative sampling, deep sampling, and listwise loss func… ▽ More

    Submitted 16 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  3. arXiv:2404.06912  [pdf, other

    cs.IR

    Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders

    Authors: Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen

    Abstract: Existing cross-encoder re-rankers can be categorized as pointwise, pairwise, or listwise models. Pair- and listwise models allow passage interactions, which usually makes them more effective than pointwise models but also less efficient and less robust to input order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder archi… ▽ More

    Submitted 16 June, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  4. arXiv:2403.17564  [pdf, other

    cs.CL

    Task-Oriented Paraphrase Analytics

    Authors: Marcel Gohsen, Matthias Hagen, Martin Potthast, Benno Stein

    Abstract: Since paraphrasing is an ill-defined task, the term "paraphrasing" covers text transformation tasks with different characteristics. Consequently, existing paraphrasing studies have applied quite different (explicit and implicit) criteria as to when a pair of texts is to be considered a paraphrase, all of which amount to postulating a certain level of semantic or lexical similarity. In this paper,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  5. arXiv:2403.07654  [pdf, other

    cs.IR

    Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models

    Authors: Andrew Parry, Maik Fröbe, Sean MacAvaney, Martin Potthast, Matthias Hagen

    Abstract: Modern sequence-to-sequence relevance models like monoT5 can effectively capture complex textual interactions between queries and documents through cross-encoding. However, the use of natural language tokens in prompts, such as Query, Document, and Relevant for monoT5, opens an attack vector for malicious documents to manipulate their relevance score through prompt injection, e.g., by adding targe… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 13 pages, 3 figures, Accepted at ECIR 2024 as a Full Paper

  6. Detecting Generated Native Ads in Conversational Search

    Authors: Sebastian Schmidt, Ines Zelch, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: Conversational search engines such as YouChat and Microsoft Copilot use large language models (LLMs) to generate responses to queries. It is only a small step to also let the same technology insert ads within the generated responses - instead of separately placing ads next to a response. Inserted ads would be reminiscent of native advertising and product placement, both of which are very effective… ▽ More

    Submitted 30 April, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: WWW'24 Short Papers Track; 4 pages

  7. arXiv:2401.14446  [pdf, other

    cs.CY cs.AI cs.CR

    Black-Box Access is Insufficient for Rigorous AI Audits

    Authors: Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell

    Abstract: External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workin… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: FAccT 2024

    Journal ref: The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT '24), June 3-6, 2024, Rio de Janeiro, Brazil

  8. Investigating the Effects of Sparse Attention on Cross-Encoders

    Authors: Ferdinand Schlatt, Maik Fröbe, Matthias Hagen

    Abstract: Cross-encoders are effective passage and document re-rankers but less efficient than other neural or classic retrieval models. A few previous studies have applied windowed self-attention to make cross-encoders more efficient. However, these studies did not investigate the potential and limits of different attention patterns or window sizes. We close this gap and systematically analyze how token in… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted at ECIR'24

  9. Evaluating Generative Ad Hoc Information Retrieval

    Authors: Lukas Gienapp, Harrisen Scells, Niklas Deckers, Janek Bevendorff, Shuai Wang, Johannes Kiesel, Shahbaz Syed, Maik Fröbe, Guido Zuccon, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: Recent advances in large language models have enabled the development of viable generative retrieval systems. Instead of a traditional document ranking, generative retrieval systems often directly return a grounded generated text as a response to a query. Quantifying the utility of the textual responses is essential for appropriately evaluating such generative ad hoc retrieval. Yet, the establishe… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 14 pages, 6 figures, 1 table. Published at SIGIR'24 perspective paper track

  10. arXiv:2310.04892  [pdf, other

    cs.IR

    Commercialized Generative AI: A Critical Study of the Feasibility and Ethics of Generating Native Advertising Using Large Language Models in Conversational Web Search

    Authors: Ines Zelch, Matthias Hagen, Martin Potthast

    Abstract: How will generative AI pay for itself? Unless charging users for access, selling advertising is the only alternative. Especially in the multi-billion dollar web search market with ads as the main source of revenue, the introduction of a subscription model seems unlikely. The recent disruption of search by generative large language models could thus ultimately be accompanied by generated ads. Our c… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: Presented at OSSYM 2023

  11. The Information Retrieval Experiment Platform

    Authors: Maik Fröbe, Jan Heinrich Reimer, Sean MacAvaney, Niklas Deckers, Simon Reich, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: We integrate ir_datasets, ir_measures, and PyTerrier with TIRA in the Information Retrieval Experiment Platform (TIREx) to promote more standardized, reproducible, scalable, and even blinded retrieval experiments. Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are compatible with ir_datasets and ir_measures. However… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: 11 pages. To be published in the proceedings of SIGIR 2023

  12. Perspectives on Large Language Models for Relevance Judgment

    Authors: Guglielmo Faggioli, Laura Dietz, Charles Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, Henning Wachsmuth

    Abstract: When asked, large language models (LLMs) like ChatGPT claim that they can assist with relevance judgments but it is not clear whether automated judgments can reliably be used in evaluations of retrieval systems. In this perspectives paper, we discuss possible ways for LLMs to support relevance judgments along with concerns and issues that arise. We devise a human--machine collaboration spectrum th… ▽ More

    Submitted 18 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    ACM Class: H.3.3

  13. The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

    Authors: Jan Heinrich Reimer, Sebastian Schmidt, Maik Fröbe, Lukas Gienapp, Harrisen Scells, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish the… ▽ More

    Submitted 31 July, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

    Comments: SIGIR 2023 resource paper, 13 pages

  14. arXiv:2301.11030  [pdf, other

    cs.CL

    Paraphrase Acquisition from Image Captions

    Authors: Marcel Gohsen, Matthias Hagen, Martin Potthast, Benno Stein

    Abstract: We propose to use image captions from the Web as a previously underutilized resource for paraphrases (i.e., texts with the same "message") and to create and analyze a corresponding dataset. When an image is reused on the Web, an original caption is often assigned. We hypothesize that different captions for the same image naturally form a set of mutual paraphrases. To demonstrate the suitability of… ▽ More

    Submitted 15 February, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

  15. A Comprehensive Review of Modern Object Segmentation Approaches

    Authors: Yuanbo Wang, Unaiza Ahsan, Hanyan Li, Matthew Hagen

    Abstract: Image segmentation is the task of associating pixels in an image with their respective object class labels. It has a wide range of applications in many industries including healthcare, transportation, robotics, fashion, home improvement, and tourism. Many deep learning-based approaches have been developed for image-level object recognition and pixel-level scene understanding-with the latter requir… ▽ More

    Submitted 13 January, 2023; originally announced January 2023.

    Comments: 173 pages, 49 figures, published in Foundations and Trends in Computer Graphics and Vision on 10/4/22. Authors retain copyright

    ACM Class: I.4.6

    Journal ref: Foundations and Trends in Computer Graphics and Vision: Vol. 13: No. 2-3, pp 111-283

  16. Sparse Pairwise Re-ranking with Pre-trained Transformers

    Authors: Lukas Gienapp, Maik Fröbe, Matthias Hagen, Martin Potthast

    Abstract: Pairwise re-ranking models predict which of two documents is more relevant to a query and then aggregate a final ranking from such preferences. This is often more effective than pointwise re-ranking models that directly predict a relevance value for each document. However, the high inference overhead of pairwise models limits their practical application: usually, for a set of $k$ documents to be r… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.

    Comments: Accepted at ICTIR 2022

  17. arXiv:2206.14759  [pdf, other

    cs.IR

    How Train-Test Leakage Affects Zero-shot Retrieval

    Authors: Maik Fröbe, Christopher Akiki, Martin Potthast, Matthias Hagen

    Abstract: Neural retrieval models are often trained on (subsets of) the millions of queries of the MS MARCO / ORCAS datasets and then tested on the 250 Robust04 queries or other TREC benchmarks with often only 50 queries. In such setups, many of the few test queries can be very similar to queries from the huge training data -- in fact, 69% of the Robust04 queries have near-duplicates in MS MARCO / ORCAS. We… ▽ More

    Submitted 30 August, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: To appear at the 29th International Symposium on String Processing and Information Retrieval (SPIRE 2022)

  18. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  19. arXiv:2203.10282  [pdf, other

    cs.CL

    Clickbait Spoiling via Question Answering and Passage Retrieval

    Authors: Matthias Hagen, Maik Fröbe, Artur Jurk, Martin Potthast

    Abstract: We introduce and study the task of clickbait spoiling: generating a short text that satisfies the curiosity induced by a clickbait post. Clickbait links to a web page and advertises its contents by arousing curiosity instead of providing an informative summary. Our contributions are approaches to classify the type of spoiler needed (i.e., a phrase or a passage), and to generate appropriate spoiler… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL 2022

  20. arXiv:2111.10864  [pdf, other

    cs.IR

    The Impact of Main Content Extraction on Near-Duplicate Detection

    Authors: Maik Fröbe, Matthias Hagen, Janek Bevendorff, Michael Völske, Benno Stein, Christopher Schröder, Robby Wagner, Lukas Gienapp, Martin Potthast

    Abstract: Commercial web search engines employ near-duplicate detection to ensure that users see each relevant result only once, albeit the underlying web crawls typically include (near-)duplicates of many web pages. We revisit the risks and potential of near-duplicates with an information retrieval focus, motivating that current efforts toward an open and independent European web search infrastructure shou… ▽ More

    Submitted 21 November, 2021; originally announced November 2021.

  21. arXiv:2107.00893  [pdf, other

    cs.DL cs.NI cs.SI

    Web Archive Analytics

    Authors: Michael Völske, Janek Bevendorff, Johannes Kiesel, Benno Stein, Maik Fröbe, Matthias Hagen, Martin Potthast

    Abstract: Web archive analytics is the exploitation of publicly accessible web pages and their evolution for research purposes -- to the extent organizationally possible for researchers. In order to better understand the complexity of this task, the first part of this paper puts the entirety of the world's captured, created, and replicated data (the "Global Datasphere") in relation to other important data s… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

    Comments: 12 pages, 5 figures. Published in the proceedings of INFORMATIK 2020

    Journal ref: INFORMATIK 2020. Gesellschaft für Informatik, Bonn. (pp. 61-72)

  22. Towards Axiomatic Explanations for Neural Ranking Models

    Authors: Michael Völske, Alexander Bondarenko, Maik Fröbe, Matthias Hagen, Benno Stein, Jaspreet Singh, Avishek Anand

    Abstract: Recently, neural networks have been successfully employed to improve upon state-of-the-art performance in ad-hoc retrieval tasks via machine-learned ranking functions. While neural retrieval models grow in complexity and impact, little is understood about their correspondence with well-studied IR principles. Recent work on interpretability in machine learning has provided tools and techniques to u… ▽ More

    Submitted 11 July, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 10 pages, 2 figures. Published in the proceedings of ICTIR 2021

  23. arXiv:2105.08581  [pdf, other

    cs.IR cs.CL

    Query Interpretations from Entity-Linked Segmentations

    Authors: Vaibhav Kasturia, Marcel Gohsen, Matthias Hagen

    Abstract: Web search queries can be ambiguous: is "source of the nile" meant to find information on the actual river or on a board game of that name? We tackle this problem by deriving entity-based query interpretations: given some query, the task is to derive all reasonable ways of linking suitable parts of the query to semantically compatible entities in a background knowledge base. Our suggested approach… ▽ More

    Submitted 5 January, 2022; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: Accepted at WSDM 2022

  24. arXiv:2103.06743  [pdf, other

    cs.CR cs.AR

    Practical Encrypted Computing for IoT Clients

    Authors: McKenzie van der Hagen, Brandon Lucia

    Abstract: Privacy and energy are primary concerns for sensor devices that offload compute to a potentially untrusted edge server or cloud. Homomorphic Encryption (HE) enables offload processing of encrypted data. HE offload processing retains data privacy, but is limited by the need for frequent communication between the client device and the offload server. Existing client-aided encrypted computing systems… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

    Comments: 13 pages

  25. arXiv:2005.14714  [pdf, other

    cs.CL

    The Importance of Suppressing Domain Style in Authorship Analysis

    Authors: Sebastian Bischoff, Niklas Deckers, Marcel Schliebs, Ben Thies, Matthias Hagen, Efstathios Stamatatos, Benno Stein, Martin Potthast

    Abstract: The prerequisite of many approaches to authorship analysis is a representation of writing style. But despite decades of research, it still remains unclear to what extent commonly used and widely accepted representations like character trigram frequencies actually represent an author's writing style, in contrast to more domain-specific style components or even topic. We address this shortcoming for… ▽ More

    Submitted 29 May, 2020; originally announced May 2020.

  26. arXiv:2005.08658  [pdf, other

    cs.IR cs.CL cs.HC

    Conversational Search -- A Report from Dagstuhl Seminar 19461

    Authors: Avishek Anand, Lawrence Cavedon, Matthias Hagen, Hideo Joho, Mark Sanderson, Benno Stein

    Abstract: Dagstuhl Seminar 19461 "Conversational Search" was held on 10-15 November 2019. 44~researchers in Information Retrieval and Web Search, Natural Language Processing, Human Computer Interaction, and Dialogue Systems were invited to share the latest development in the area of Conversational Search and discuss its research agenda and future directions. A 5-day program of the seminar consisted of six i… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Comments: contains arXiv:2001.06910, arXiv:2001.02912

  27. Abstractive Snippet Generation

    Authors: Wei-Fan Chen, Shahbaz Syed, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: An abstractive snippet is an originally created piece of text to summarize a web page on a search engine results page. Compared to the conventional extractive snippets, which are generated by extracting phrases and sentences verbatim from a web page, abstractive snippets circumvent copyright issues; even more interesting is the fact that they open the door for personalization. Abstractive snippets… ▽ More

    Submitted 15 March, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Accepted by WWW 2020

  28. arXiv:2001.06910  [pdf, ps, other

    cs.IR

    Common Conversational Community Prototype: Scholarly Conversational Assistant

    Authors: Krisztian Balog, Lucie Flekova, Matthias Hagen, Rosie Jones, Martin Potthast, Filip Radlinski, Mark Sanderson, Svitlana Vakulenko, Hamed Zamani

    Abstract: This paper discusses the potential for creating academic resources (tools, data, and evaluation approaches) to support research in conversational search, by focusing on realistic information needs and conversational interactions. Specifically, we propose to develop and operate a prototype conversational search system for scholarly activities. This Scholarly Conversational Assistant would serve as… ▽ More

    Submitted 19 January, 2020; originally announced January 2020.

  29. Answering Comparative Questions: Better than Ten-Blue-Links?

    Authors: Matthias Schildwächter, Alexander Bondarenko, Julian Zenker, Matthias Hagen, Chris Biemann, Alexander Panchenko

    Abstract: We present CAM (comparative argumentative machine), a novel open-domain IR system to argumentatively compare objects with respect to information extracted from the Common Crawl. In a user study, the participants obtained 15% more accurate answers using CAM compared to a "traditional" keyword-based search and were 20% faster in finding the answer to comparative questions.

    Submitted 15 January, 2019; originally announced January 2019.

    Comments: In Proceeding of 2019 Conference on Human Information Interaction and Retrieval (CHIIR '19), March 10--14, 2019, Glasgow, United Kingdom

  30. arXiv:1812.10847  [pdf, other

    cs.CL cs.IR

    The Clickbait Challenge 2017: Towards a Regression Model for Clickbait Strength

    Authors: Martin Potthast, Tim Gollub, Matthias Hagen, Benno Stein

    Abstract: Clickbait has grown to become a nuisance to social media users and social media operators alike. Malicious content publishers misuse social media to manipulate as many users as possible to visit their websites using clickbait messages. Machine learning technology may help to handle this problem, giving rise to automatic clickbait detection. To accelerate progress in this direction, we organized th… ▽ More

    Submitted 27 December, 2018; originally announced December 2018.

  31. arXiv:1812.09221  [pdf, other

    cs.IR

    Wikipedia Text Reuse: Within and Without

    Authors: Milad Alshomary, Michael Völske, Tristan Licht, Henning Wachsmuth, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: We study text reuse related to Wikipedia at scale by compiling the first corpus of text reuse cases within Wikipedia as well as without (i.e., reuse of Wikipedia text in a sample of the Common Crawl). To discover reuse beyond verbatim copy and paste, we employ state-of-the-art text reuse detection technology, scaling it for the first time to process the entire Wikipedia as part of a distributed re… ▽ More

    Submitted 21 December, 2018; originally announced December 2018.

    Comments: accepted at ECIR 2019

  32. arXiv:1809.06152  [pdf, other

    cs.CL

    Categorizing Comparative Sentences

    Authors: Alexander Panchenko, Alexander Bondarenko, Mirco Franzek, Matthias Hagen, Chris Biemann

    Abstract: We tackle the tasks of automatically identifying comparative sentences and categorizing the intended preference (e.g., "Python has better NLP libraries than MATLAB" => (Python, better, MATLAB). To this end, we manually annotate 7,199 sentences for 217 distinct target item pairs from several domains (27% of the sentences contain an oriented comparison in the sense of "better" or "worse"). A gradien… ▽ More

    Submitted 8 July, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

    Comments: In Proceedings of the the 6th Workshop on Argument Mining (ArgMining'2019) August 1st, collocated with ACL 2019 in Florence, Italy

  33. arXiv:1802.01191  [pdf, other

    cs.CL

    Heuristic Feature Selection for Clickbait Detection

    Authors: Matti Wiegmann, Michael Völske, Benno Stein, Matthias Hagen, Martin Potthast

    Abstract: We study feature selection as a means to optimize the baseline clickbait detector employed at the Clickbait Challenge 2017. The challenge's task is to score the "clickbaitiness" of a given Twitter tweet on a scale from 0 (no clickbait) to 1 (strong clickbait). Unlike most other approaches submitted to the challenge, the baseline approach is based on manual feature engineering and does not compete… ▽ More

    Submitted 4 February, 2018; originally announced February 2018.

    Comments: Clickbait Challenge 2017