aws-emr
Here are 128 public repositories matching this topic...
A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver
-
Updated
Jun 12, 2024 - Python
The project will utilize Airflow to orchestrate and manage the data pipeline as it creates and terminates an EMR transient cluster to save on cost. Apache Spark will transform data, and the final dataset will be loaded into Snowflake.
-
Updated
Jun 11, 2024 - Python
An AWS based solution using AWS CloudWatch and AWS Lambda based on Python to automatically terminate AWS EMR clusters that have been idle for a specified period of time.
-
Updated
Jun 5, 2024 - Python
Cloud-based AI / ML workflow and data application development framework
-
Updated
Jun 1, 2024 - Python
Analyzing Spark Cluster Performance in Amazon EMR
-
Updated
May 22, 2024 - Python
-
Updated
May 11, 2024 - Jupyter Notebook
Terraform module to create AWS EMR resources ���🇦
-
Updated
May 4, 2024 - HCL
Analysis and monitoring system using AWS... Also the comp4442 project
-
Updated
Apr 26, 2024 - Python
Ce projet a pour but de réaliser une extraction de features, suivie d'une PCA sur des données volumineuses à l'aide de Spark dans le cloud.
-
Updated
Mar 14, 2024 - Jupyter Notebook
Big data analysis with AWS services, filtering the Wikiticker dataset with Apache Spark on Amazon EMR, storing data in S3, cataloging with AWS Glue, and querying with Amazon Athena. This end-to-end pipeline exemplifies handling and analyzing big data in the cloud.
-
Updated
Mar 7, 2024 - Python
Utilize Apache Spark for ETL processes to prepare data, followed by the construction of a Machine Learning model for Natural Language Processing (NLP) classification. Subsequently, deploy the model within a Gradio web application for seamless interaction.
-
Updated
Feb 2, 2024 - Jupyter Notebook
Technology blogging website from Siby Abin. Talks about dataengineering, aws, spark, python, airflow and more
-
Updated
Jan 13, 2024 - SCSS
Completed a big data project using Hadoop, HBase, and Sqoop to ingest, process, and analyze a large dataset of taxi ride data on an AWS EMR cluster. Developed MapReduce codes to perform a variety of tasks. Exported the results of each MapReduce task to an RDS instance for visualization and analysis.
-
Updated
Oct 31, 2023 - Python
Distributed computational problem-solving project, which aims to perform large-scale graph matching using cloud computing technologies. The project allows users to import two directed graphs and analyze the differences between them.
-
Updated
Oct 25, 2023 - Scala
My AWS Playground
-
Updated
Jun 18, 2024 - Python
Improve this page
Add a description, image, and links to the aws-emr topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the aws-emr topic, visit your repo's landing page and select "manage topics."