Skip to content
/ NLP Public

Some of my NLP projects I've worked on and to harden my experience with the research field of NLP.

License

Notifications You must be signed in to change notification settings

naetherm/NLP

Repository files navigation

NLP

This repository represents just a collection of machine and deep learning approaches for different tasks in the field of NLP (Natural Language Processing).

Note

Some codes, e.g. the BERT models for entity tagging and pos tagging are using <= Tensorflow 1.15, I am currently in the process of upgrading those to 2.x.

Table of Contents

Content

All models were trained and evaluation on the Tatoeba dataset.

There are the following implementations:

  • Baseline implementation using the python langdetect module (00_langdetect.py)
  • Character N-Gram implementation (01_nsec_langdetect.py)

All models were trained and evaluated on CONLL POS dataset.

There are the following implementations:

  • Basic BERT language model approach (10_bert.py)

All models were trained and evaluated on CONLL POS dataset.

There are the following implementations:

  • Basic BERT language model approach (10_bert.py)

All models were trained and evaluated on IMDB Dataset.

There are the following implementations:

  • LSTM model (01_lstm.py)
  • Bidirectional LSTM model (02_bilstm.py)

All models were trained on the first 30.000 lines of Oscar Corpus EN.

There are the following implementations:

  • LSTM based model (01_lstm.py)
  • Bidirectional LSTM based model (02_bilstm.py)
  • CNN based model (03_cnn.py)

All models within the chatbot section were trained with the Cornell Movie Dialog Corpus. The required files from the corpus were already added to the repository.

There are the following implementations:

  • Basic RNN model (01_seq2seq_rnn.py)
  • LSTM model (02_seq2seq_lstm.py)
  • GRU model (03_seq2seq_gru.py)
  • Bidirectional Basic RNN model (04_seq2seq_birnn.py)
  • Bidirectional LSTM model (05_seq2seq_bilstm.py)
  • Bidirectional GRU model (06_seq2seq_bigru.py)