Skip to main content

Showing 1–31 of 31 results for author: Shakeri, S

  1. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  2. arXiv:2404.10019  [pdf, other

    astro-ph.IM astro-ph.GA astro-ph.HE cs.AI cs.LG physics.data-an

    Can AI Understand Our Universe? Test of Fine-Tuning GPT by Astrophysical Data

    Authors: Yu Wang, Shu-Rui Zhang, Aidin Momtaz, Rahim Moradi, Fatemeh Rastegarnia, Narek Sahakyan, Soroush Shakeri, Liang Li

    Abstract: ChatGPT has been the most talked-about concept in recent months, captivating both professionals and the general public alike, and has sparked discussions about the changes that artificial intelligence (AI) will bring to the world. As physicists and astrophysicists, we are curious about if scientific data can be correctly analyzed by large language models (LLMs) and yield accurate physics. In this… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 27 pages, 7 figures. Comments welcome

  3. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  4. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  5. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  6. arXiv:2306.04009  [pdf, other

    cs.CL cs.AI

    Triggering Multi-Hop Reasoning for Question Answering in Language Models using Soft Prompts and Random Walks

    Authors: Kanishka Misra, Cicero Nogueira dos Santos, Siamak Shakeri

    Abstract: Despite readily memorizing world knowledge about entities, pre-trained language models (LMs) struggle to compose together two or more facts to perform multi-hop reasoning in question-answering tasks. In this work, we propose techniques that improve upon this limitation by relying on random walks over structured knowledge graphs. Specifically, we use soft prompts to guide LMs to chain together thei… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Findings of ACL 2023

  7. arXiv:2306.00008  [pdf, other

    cs.LG cs.CL

    Brainformers: Trading Simplicity for Efficiency

    Authors: Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng, Chang Lan, Da Huang, Siamak Shakeri, David So, Andrew Dai, Yifeng Lu, Zhifeng Chen, Quoc Le, Claire Cui, James Laudon, Jeff Dean

    Abstract: Transformers are central to recent successes in natural language processing and computer vision. Transformers have a mostly uniform backbone where layers alternate between feed-forward and self-attention in order to build a deep network. Here we investigate this design choice and find that more complex blocks that have different permutations of layer primitives can be more efficient. Using this in… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 May, 2023; originally announced June 2023.

  8. arXiv:2305.18565  [pdf, other

    cs.CV cs.CL cs.LG

    PaLI-X: On Scaling up a Multilingual Vision and Language Model

    Authors: Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic , et al. (18 additional authors not shown)

    Abstract: We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-sh… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  9. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  10. arXiv:2302.13408  [pdf, other

    cs.CV cs.AI cs.LG

    Generative Models for 3D Point Clouds

    Authors: Lingjie Kong, Pankaj Rajak, Siamak Shakeri

    Abstract: Point clouds are rich geometric data structures, where their three dimensional structure offers an excellent domain for understanding the representation learning and generative modeling in 3D space. In this work, we aim to improve the performance of point cloud latent-space generative models by experimenting with transformer encoders, latent-space flow models, and autoregressive decoders. We analy… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

  11. arXiv:2302.05578  [pdf, ps, other

    cs.CL cs.AI

    Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models

    Authors: Renat Aksitov, Chung-Ching Chang, David Reitter, Siamak Shakeri, Yunhsuan Sung

    Abstract: Despite recent progress, it has been difficult to prevent semantic hallucinations in generative Large Language Models. One common solution to this is augmenting LLMs with a retrieval system and making sure that the generated output is attributable to the retrieved information. Given this new added constraint, it is plausible to expect that the overall quality of the output will be affected, for ex… ▽ More

    Submitted 14 February, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

  12. arXiv:2301.11719  [pdf, other

    cs.CL cs.AI cs.LG

    The Exploration of Knowledge-Preserving Prompts for Document Summarisation

    Authors: Chen Chen, Wei Emma Zhang, Alireza Seyed Shakeri, Makhmoor Fiza

    Abstract: Despite the great development of document summarisation techniques nowadays, factual inconsistencies between the generated summaries and the original texts still occur from time to time. This study explores the possibility of adopting prompts to incorporate factual knowledge into generated summaries. We specifically study prefix-tuning that uses a set of trainable continuous prefix prompts togethe… ▽ More

    Submitted 16 May, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  13. arXiv:2210.17155  [pdf, other

    cs.DC

    Mahiru: a federated, policy-driven data processing and exchange system

    Authors: Lourens E. Veen, Sara Shakeri, Paola Grosso

    Abstract: Secure, privacy-preserving sharing of scientific or business data is currently a popular topic for research and development, both in academia and outside of it. Systems have been proposed for sharing individual facts about individuals and sharing entire data sets, for sharing data through trusted third parties, for obfuscating sensitive data by anonymisation and homomorphic encryption, for distrib… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

    Comments: 15 pages, 5 figures

    ACM Class: H.3.5; D.4.6

  14. arXiv:2210.11399  [pdf, other

    cs.CL cs.AI cs.LG

    Transcending Scaling Laws with 0.1% Extra Compute

    Authors: Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani

    Abstract: Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model (e.g., PaLM) on a few more steps with UL2's mixture-of-denoiser objec… ▽ More

    Submitted 16 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

    Comments: V2 has updated references/related work

  15. arXiv:2210.04726  [pdf, other

    cs.CL cs.AI cs.LG

    Knowledge Prompts: Injecting World Knowledge into Language Models through Soft Prompts

    Authors: Cicero Nogueira dos Santos, Zhe Dong, Daniel Cer, John Nham, Siamak Shakeri, Jianmo Ni, Yun-hsuan Sung

    Abstract: Soft prompts have been recently proposed as a tool for adapting large frozen language models (LMs) to new tasks. In this work, we repurpose soft prompts to the task of injecting world knowledge into LMs. We introduce a method to train soft prompts via self-supervised learning on data from knowledge bases. The resulting soft knowledge prompts (KPs) are task independent and work as an external memor… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  16. arXiv:2208.05577  [pdf, other

    cs.CL

    Reducing Retraining by Recycling Parameter-Efficient Prompts

    Authors: Brian Lester, Joshua Yurtsever, Siamak Shakeri, Noah Constant

    Abstract: Parameter-efficient methods are able to use a single frozen pre-trained large language model (LLM) to perform many tasks by learning task-specific soft prompts that modulate model behavior when concatenated to the input text. However, these learned prompts are tightly coupled to a given frozen model -- if the model is updated, corresponding new prompts need to be obtained. In this work, we propose… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

  17. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  18. arXiv:2205.12416  [pdf, other

    cs.CL

    Counterfactual Data Augmentation improves Factuality of Abstractive Summarization

    Authors: Dheeraj Rajagopal, Siamak Shakeri, Cicero Nogueira dos Santos, Eduard Hovy, Chung-Ching Chang

    Abstract: Abstractive summarization systems based on pretrained language models often generate coherent but factually inconsistent sentences. In this paper, we present a counterfactual data augmentation approach where we augment data with perturbed summaries that increase the training data diversity. Specifically, we present three augmentation approaches based on replacing (i) entities from other and the sa… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  19. arXiv:2205.05131  [pdf, other

    cs.CL

    UL2: Unifying Language Learning Paradigms

    Authors: Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Siamak Shakeri, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Denny Zhou, Neil Houlsby, Donald Metzler

    Abstract: Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should be. This paper presents a unified framework for pre-training models that are universally effective across datasets and setups. We begin by disentangling architectural archetypes with pre-training objectiv… ▽ More

    Submitted 28 February, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: Updated Q1 2023 with Flan-UL2 20B release! :)

  20. arXiv:2110.08426  [pdf, other

    cs.CL

    EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models

    Authors: Frederick Liu, Terry Huang, Shihang Lyu, Siamak Shakeri, Hongkun Yu, Jing Li

    Abstract: Pre-trained encoder-decoder transformer architectures have become increasingly popular recently with the advent of T5 models. T5 has also become more favorable over other architectures like BERT due to the amount of data that it is pre-trained on, increased scale of model parameter sizes and easy applicability to a diverse set of tasks due to the generative nature of the model. While being able to… ▽ More

    Submitted 23 October, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Update multi-label and structured prediction results

  21. arXiv:2012.06154  [pdf, other

    cs.CL cs.AI

    ParsiNLU: A Suite of Language Understanding Challenges for Persian

    Authors: Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, Yadollah Yaghoobzadeh

    Abstract: Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluat… ▽ More

    Submitted 13 July, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

    Comments: To appear on Transactions of the Association for Computational Linguistics (TACL), 2021

  22. arXiv:2010.12688  [pdf, other

    cs.CL

    Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training

    Authors: Oshin Agarwal, Heming Ge, Siamak Shakeri, Rami Al-Rfou

    Abstract: Prior work on Data-To-Text Generation, the task of converting knowledge graph (KG) triples into natural text, focused on domain-specific benchmark datasets. In this paper, however, we verbalize the entire English Wikidata KG, and discuss the unique challenges associated with a broad, open-domain, large-scale verbalization. We further show that verbalizing a comprehensive, encyclopedic KG like Wiki… ▽ More

    Submitted 13 March, 2021; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: Accepted at NAACL 2021

  23. arXiv:2010.12008  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

    Authors: Siamak Shakeri, Noah Constant, Mihir Sanjay Kale, Linting Xue

    Abstract: We propose a simple method to generate multilingual question and answer pairs on a large scale through the use of a single generative model. These synthetic samples can be used to improve the zero-shot performance of multilingual QA models on target languages. Our proposed multi-task training of the generative model only requires the labeled training samples in English, thus removing the need for… ▽ More

    Submitted 28 May, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  24. arXiv:2010.06028  [pdf, other

    cs.CL

    End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems

    Authors: Siamak Shakeri, Cicero Nogueira dos Santos, Henry Zhu, Patrick Ng, Feng Nan, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

    Abstract: We propose an end-to-end approach for synthetic QA data generation. Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token. The likelihood produced in the generation process is used as a filtering… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  25. arXiv:2009.10270  [pdf, other

    cs.IR

    Embedding-based Zero-shot Retrieval through Query Generation

    Authors: Davis Liang, Peng Xu, Siamak Shakeri, Cicero Nogueira dos Santos, Ramesh Nallapati, Zhiheng Huang, Bing Xiang

    Abstract: Passage retrieval addresses the problem of locating relevant passages, usually from a large corpus, given a query. In practice, lexical term-matching algorithms like BM25 are popular choices for retrieval owing to their efficiency. However, term-based matching algorithms often miss relevant passages that have no lexical overlap with the query and cannot be finetuned to downstream datasets. In this… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

  26. arXiv:2001.04828  [pdf, other

    cs.IR cs.LG stat.ML

    TableQnA: Answering List Intent Queries With Web Tables

    Authors: Kaushik Chakrabarti, Zhimin Chen, Siamak Shakeri, Guihong Cao, Surajit Chaudhuri

    Abstract: The web contains a vast corpus of HTML tables. They can be used to provide direct answers to many web queries. We focus on answering two classes of queries with those tables: those seeking lists of entities (e.g., `cities in california') and those seeking superlative entities (e.g., `largest city in california'). The main challenge is to achieve high precision with significant coverage. Existing a… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

  27. arXiv:2001.03272  [pdf, other

    cs.IR cs.LG

    Open Domain Question Answering Using Web Tables

    Authors: Kaushik Chakrabarti, Zhimin Chen, Siamak Shakeri, Guihong Cao

    Abstract: Tables extracted from web documents can be used to directly answer many web search queries. Previous works on question answering (QA) using web tables have focused on factoid queries, i.e., those answerable with a short string like person name or a number. However, many queries answerable using tables are non-factoid in nature. In this paper, we develop an open-domain QA approach using web tables… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

  28. arXiv:1911.11952  [pdf, ps, other

    cs.LG cs.CL stat.ML

    Label Dependent Deep Variational Paraphrase Generation

    Authors: Siamak Shakeri, Abhinav Sethy

    Abstract: Generating paraphrases that are lexically similar but semantically different is a challenging task. Paraphrases of this form can be used to augment data sets for various NLP tasks such as machine reading comprehension and question answering with non-trivial negative examples. In this article, we propose a deep variational model to generate paraphrases conditioned on a label that specifies whether… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  29. arXiv:1911.11065  [pdf, ps, other

    cs.IR cs.CL cs.LG

    Knowledge Distillation in Document Retrieval

    Authors: Siamak Shakeri, Abhinav Sethy, Cheng Cheng

    Abstract: Complex deep learning models now achieve state of the art performance for many document retrieval tasks. The best models process the query or claim jointly with the document. However for fast scalable search it is desirable to have document embeddings which are independent of the claim. In this paper we show that knowledge distillation can be used to encourage a model that generates claim independ… ▽ More

    Submitted 11 November, 2019; originally announced November 2019.

    Comments: Published at Amazon Machine Learning Conference(AMLC) 2019

  30. arXiv:1810.12464  [pdf, other

    cs.LG stat.ML

    Differentiable Greedy Networks

    Authors: Thomas Powers, Rasool Fakoor, Siamak Shakeri, Abhinav Sethy, Amanjit Kainth, Abdel-rahman Mohamed, Ruhi Sarikaya

    Abstract: Optimal selection of a subset of items from a given set is a hard problem that requires combinatorial optimization. In this paper, we propose a subset selection algorithm that is trainable with gradient-based methods yet achieves near-optimal performance via submodular optimization. We focus on the task of identifying a relevant set of sentences for claim verification in the context of the FEVER t… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

    Comments: Work in progress and under review

  31. arXiv:1810.00679  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    Direct optimization of F-measure for retrieval-based personal question answering

    Authors: Rasool Fakoor, Amanjit Kainth, Siamak Shakeri, Christopher Winestock, Abdel-rahman Mohamed, Ruhi Sarikaya

    Abstract: Recent advances in spoken language technologies and the introduction of many customer facing products, have given rise to a wide customer reliance on smart personal assistants for many of their daily tasks. In this paper, we present a system to reduce users' cognitive load by extending personal assistants with long-term personal memory where users can store and retrieve by voice, arbitrary pieces… ▽ More

    Submitted 27 September, 2018; originally announced October 2018.

    Comments: accepted at SLT2018