Skip to content

arv-anshul/yt-watch-history-v2

Repository files navigation

YouTube Watch History Analyser - V2

Analyse you YouTube Watch History using Machine Learning, plot graphs, etc.

GitHub Material for MkDocs Mermaid

Working

API (Backend)

  • Used FastAPI to create backend APIs to interact with MongoDB database.
  • Used YouTube Data API v3 to fetch details about videos (you have watched).
  • Used Docker to containerize the FastAPI application.

ML (Models and API)

Models

  1. Video's Content Type Predictor
    • Multiclass Classification Problem
    • Uses Video's title and tags to predict Content Type
    • Planning to add Video's categoryId and duration for prediction but wants to sure about improvements
  2. Channel Recommender System
    • Recommender System
    • Uses channel's videos title and tags to calculate similarity
    • Uses TfidfVectorizer for text to vec convertion
    • Uses user's channel subscriptions data to recommend channel

Important

By the way, I'm planning to upload the trained model to internet and model is download from URL to docker container once (if not exists).

The model URL is provide through environment variable (CTT_MODEL_URL). If you want you can provide your model's URL.

This solution may works in short term 🀞

API

  • Used FastAPI to serve model.
  • Containerize FastAPI application and models using Docker.

Frontend

  • Uses Streamlit to create multipage web application where users can upload their required data and see analysis.
  • Requires YouTube API Key to fetch video details from API for advance analysis.
  • Uses httpx library to interact make requests to "Backend APIs" and "ML APIs".
  • Uses Polars for data manipulation.

Apps Composition

Setup

Clone this GitHub Repository

git clone https://github.com/arv-anshul/yt-watch-history-v2
cd yt-watch-history-v2

Open Docker Desktop and run below command:

πŸ‘€ See docker-compose.yaml

docker compose up --build  # First build the container and then run it (for first time)

Roadmap

  • πŸͺ  Create a ETL pipeline to train models
  • πŸ“Œ Integrate mlflow1 for ML Model monitoring
  • πŸ› οΈ Build the basics from yt-watch-history project
  • 🎨 Draw diagrams for references
  • ⛓️ How to intergrate pre-trained ML Model
  • πŸ€– Build Channel Recommender System
  • πŸ‘· Better CTT Model pipeline

Footnotes

  1. CampusX is going to launch a free course on MLFlow. Nitish Sir announce this in his recent video. ↩