Skip to content
#

airflow-dags

Here are 261 public repositories matching this topic...

In this project, we developed an ETL pipeline using Apache Airflow to process delivery data and track delayed shipments. The pipeline downloads data from an AWS S3 bucket, cleans it using Spark/Spark SQL to identify missing delivery deadlines, and uploads the cleaned dataset back to S3. This ensures efficient delivery performance tracking.

  • Updated Jul 17, 2024
  • Jupyter Notebook

O objetivo deste projeto é contribuir com a formação de iniciantes que almejam entrar na área de dados, fornecendo uma visão baseada em dados sobre as habilidades e conhecimentos mais demandados pelo mercado. Através da coleta e análise de vagas de emprego/estágio, o projeto visa responder à pergunta: “Como se tornar um profissional de dados?"

  • Updated Jul 10, 2024
  • Python

A Python script extracts data from Zillow and stores it in an initial S3 bucket. Then, Lambda functions handle the flow: copying the data to a processing bucket and transforming it from JSON to CSV format. The final CSV data resides in another S3 bucket, ready to be loaded into Amazon Redshift for in-depth analysis. QuickSight for visualizations

  • Updated Jun 10, 2024
  • Python

Improve this page

Add a description, image, and links to the airflow-dags topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the airflow-dags topic, visit your repo's landing page and select "manage topics."

Learn more