Airflow Data Processing Pipeline for TUL Catalog on Blacklight Data
-
Updated
Jul 19, 2024 - Python
Airflow Data Processing Pipeline for TUL Catalog on Blacklight Data
In this project, we developed an ETL pipeline using Apache Airflow to process delivery data and track delayed shipments. The pipeline downloads data from an AWS S3 bucket, cleans it using Spark/Spark SQL to identify missing delivery deadlines, and uploads the cleaned dataset back to S3. This ensures efficient delivery performance tracking.
Gerador de DAGs no Apache Airflow para fazer clipping do Diário Oficial da União.
Real-time Data Warehousing with Airflow: An events based microservices pipeline.
This repository includes data engineering projects using Apache Airflow. I hope to add more projects using different technologies soon!
O objetivo deste projeto é contribuir com a formação de iniciantes que almejam entrar na área de dados, fornecendo uma visão baseada em dados sobre as habilidades e conhecimentos mais demandados pelo mercado. Através da coleta e análise de vagas de emprego/estágio, o projeto visa responder à pergunta: “Como se tornar um profissional de dados?"
Collaborative and hybrid recommendation systems
This project showcases an ELT pipeline that extracts JSON data, loads it into a PostgreSQL database, applies transformations using Python scripts, saves the transformed data in a CSV file, and shares it through a FastAPI endpoint.
Multi-Modal Representational Learning for Social Media Popularity Prediction
A simple DAG for NewsAPI and Apache Airflow: https://airflow.apache.org/
Airflow DAGs for the Manifold (TUL Website) application
Airflow DAGs for PA Digital aggregation processes
Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows
Status BI python DAGs for Airflow
EEA Crawler contains the tasks (DAGs) used by Apache Airflow to index content from various EEA-Eionet websites into a central Elasticsearch (aka content hub).
Automation of Iris flower classes Mlflow experimental logging and prediction
A dbt data pipeline capstone project.
Using yfinance, we grab minute-by-minute BTC-USD data, dump it into PostgreSQL, and link Excel via ODBC for quick analysis!
A Python script extracts data from Zillow and stores it in an initial S3 bucket. Then, Lambda functions handle the flow: copying the data to a processing bucket and transforming it from JSON to CSV format. The final CSV data resides in another S3 bucket, ready to be loaded into Amazon Redshift for in-depth analysis. QuickSight for visualizations
Add a description, image, and links to the airflow-dags topic page so that developers can more easily learn about it.
To associate your repository with the airflow-dags topic, visit your repo's landing page and select "manage topics."