|
🔥 Kafka + Spark Structured Streaming
Ingest API data via Airflow, stream through Kafka, process with Spark, write to Cassandra |
☁️ AWS Glue + EMR + S3 Pipeline
Cloud data engineering on AWS: Glue ETL, EMR Spark jobs, S3 data lake, Athena querying |
⚡ Streaming Data Processing
Kafka → PySpark → Elasticsearch & MinIO. Full streaming pipeline with Airflow orchestration |
|
🧩 Airflow + Kafka + Cassandra + MongoDB
Produce Kafka messages, consume and write to both Cassandra and MongoDB via Airflow DAGs |
🐳 CSV Extract with Airflow & Docker
Extract CSV to Postgres, transform with Pandas, orchestrate with Airflow — fully Dockerized |
📈 Crypto Data Pipeline
Stream crypto data from API via Kafka with Airflow, store in MySQL, visualize with Metabase |
|
Python |
SQL |
|
Spark |
Kafka |
Flink |
Hadoop |
Hive |
|
Airflow |
AWS MWAA |
GCP Cloud Composer |
|
Snowflake |
Teradata |
GCP BigQuery |
AWS RDS & Aurora |
AWS Redshift |
GCP Cloud SQL |
PostgreSQL |
MySQL |
|
DBT |
Rudderstack |
Fivetran |
Airbyte |
Hevo |
|
AWS |
GCP |
Terraform |
Docker |
Kubernetes |
|
Looker |
Tableau |
Metabase |
AWS Quicksight |
Holistics |
|
Hugging Face |
Streamlit |
MLflow |
FastAPI |
LangChain |
Langfuse |
Qdrant |
|
AWS Step Functions |
AWS Glue |
AWS Lambda |
AWS Kinesis |
AWS EC2 |
AWS S3 |
AWS Athena |
|
GCP Datastream |
GCP Cloud Build |
GCP Cloud Storage |
GCP Source Repository |
GCP Cloud Scheduler |
|
Git |
Jenkins |
CircleCI |


