Data Engineering Projects:

Project Description

This repository Contains collection of resources and code for the aspiring data engineer. It aims to provide a solid foundation and practical guidance for individuals interested in pursuing a career in the field of data engineering.

PROJECTS

API TO RDS USING LAMBDA WITH SLACK ERROR MONITORING

Using AWS Lambda Api is fetched from a link, processed and load into AWS RDS with 15 seconds Interval
Two Lambda functions are used in these Pipeline where First lambda will be invoked by Aws Step-Function which is invoked by Cloudwatch / EventBridge Rules. For Every One minute until the Rule gets disabled.
Second Lambda Function is used to fetch Api and Loaded into AWS RDS.
Aws Step-Function is Working Based ASL(Amazon State Language) which is based on Json file Structure
If any error or Database connction problem occurs notification is sent to slack channel using slack_sdk
All internal Connections between AWS services are based on IAM Role and Policies.

SPARK-ENABLED EXTRACTION AND LOADING INTO AWS RDS

There are Two Part in these Project.\n
Part1 is Getting Data from SEC.gov(Zip format) contains more than 8.5 lakh Json files around 6gb after uncompressing.
By Using Apache Spark(PySpark) and DataBricks, Json files are converted into Pyspark DataFrames with Each json File representing single row in DataFrame.The DataFrame is later converted into Json file and uploaded into AWS S3.
Part2 is Getting Data from AWS S3, Do the Needed Transformation and upload into AWS RDS-Mysql Instance
The Data From S3 is Converted into PySpark DataFrame and Isolate needed Columns that needed to uplaoded into RDS
Important Function Used for Transformation are join, posexplode_outer, udf, concat, to_date, struct and Row.

YouTube Data Harvesting and Warehousing

Ability to input a YouTube channel ID and retrieve all the relevant data using Google API.
Option to store the data in a MongoDB database as a Data Lake.
Ability to collect data for up to 10 different YouTube channels and store them in the data lake based upon user Requirment.
Option to select a channel name and migrate its data from the data lake to a Mysql(SQL) Database as tables.
Ability to search and retrieve data from the SQL database using different search options, including joining tables to get channel details.

PhonePe Pulse Data Analysis 2018-2022

Getting the PhonePe Payment App-Data in Json format from Github repo
The Json files are separated for every 3 Month / 1 Quarter of years from 2018-2022 for every states and districts in India.
Using Python os module, Pipeline is Built to Iterate to each folder and get data from json file and convert into pandas DataFrame.
Json Files Contains Details about Amount of Transactions and Transaction Location where Users Do that Transaction.
Using The DataFrame, Visualization are made using Plotly and Streamlit on Geo, Bar, line, Pie, Area chart are included.

Twitter Scraping

Based on User needs Twitter Tweets are Extracted and Uploaded into Mongodb using UI based upon Streamlite based app
Users have to enter Tweets topic or hashtag, Starting Date, Ending Date, Total Number of Tweets needed to extracted in app and
App will fetch the data by using Snscrape and convert the data into Pandas DataFrame and displayed as Tabular Format.
After Checking the data users can have options to download the data as json, csv or can be uploaded into Mongodb.

License

This project is licensed under the MIT License. Please review the license file for more details.

Contact

If you have any questions or suggestions regarding this project, feel free to reach out to me at pnrajk@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
AWS: Bulk & Near Real-Time Pipelines		AWS: Bulk & Near Real-Time Pipelines
Azure Serverless Telemetry Data Processing & Analytics Dashboard		Azure Serverless Telemetry Data Processing & Analytics Dashboard
Empowering-Aws-in-Data Engineering		Empowering-Aws-in-Data Engineering
Phonephe_Pulse		Phonephe_Pulse
YouTube_Data_Harvesting_and_Warehousing		YouTube_Data_Harvesting_and_Warehousing
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Projects:

Project Description

PROJECTS

License

Contact

About

Releases

Packages

Languages

pnraj/Projects

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Projects:

Project Description

PROJECTS

License

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages