Viresa: an AI-powered virtual assistant for scientists

status: under development

Viresa: an AI-powered virtual assistant for scientists

Web-app : https://share.streamlit.io/m-tari/arxiv_interface

(note: article classification part is not implemented in the web-app yet.)

Overview

A common task for scientists is to extract knowledge from scientific articles: What area of research does it belong to? What is the best one-line summary of the context? What are the relevant articles to the new information? We built a tool to answer these questions!

Background and Motivation

ArXiv is a collaboratively funded, community-supported resource founded by Paul Ginsparg in 1991 and maintained and operated by Cornell University [1]. They promote open scientific collaboration and progress by providing tools and contents for scientists all over the world. Our hope is that by using the rich dataset of scholary articles and powerfull machine learning techniques we discover insights about scientific works, and we try to build simple tools to explore the dataset for trend analysis, paper recommender engines, category prediction, and more.

Goals

To build a multi-label multiclass classification model capable of automatic tagging of the summaries of articles, generating titles for the summaries, and reccommending similar articles to the user.

Datasets

Dataset used in this project is the metadata file of the arXiv dataset provided for the Kaggle classification and title genertion challenges. This dataset contains 1.7M+ scholarly papers across STEM, with relevant features such as article titles, authors, categories, abstracts, and more.

Milestones

~~Build a classical machine learning classifer for automatic tagging of articles~~
~~Use transformers for title generation~~
~~Build a recommender system for similar articles~~
Apply RNNs for title generation and compare the performance with transformers
Try different word embeddings (using Word2Vec, GloVe, etc)

References

[1] https://arxiv.org/

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
input		input
models		models
notebooks		notebooks
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

status: under development

Viresa: an AI-powered virtual assistant for scientists

Overview

Background and Motivation

Goals

Datasets

Milestones

References

About

Releases

Packages

Languages

m-tari/arxiv_interface

Folders and files

Latest commit

History

Repository files navigation

status: under development

Viresa: an AI-powered virtual assistant for scientists

Overview

Background and Motivation

Goals

Datasets

Milestones

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages