Skip to content
View dogukannulu's full-sized avatar

Block or report dogukannulu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dogukannulu/README.md

Hi 👋, I'm Dogukan Ulu

Senior Data & AI Engineer · Building production-grade pipelines & LLM systems

Medium   X / Twitter   GitHub   GitHub Sponsors


📌 Featured Projects

🔥 Kafka + Spark Structured Streaming
Ingest API data via Airflow, stream through Kafka, process with Spark, write to Cassandra

stars forks
☁️ AWS Glue + EMR + S3 Pipeline
Cloud data engineering on AWS: Glue ETL, EMR Spark jobs, S3 data lake, Athena querying

stars forks
⚡ Streaming Data Processing
Kafka → PySpark → Elasticsearch & MinIO. Full streaming pipeline with Airflow orchestration

stars forks
🧩 Airflow + Kafka + Cassandra + MongoDB
Produce Kafka messages, consume and write to both Cassandra and MongoDB via Airflow DAGs

stars forks
🐳 CSV Extract with Airflow & Docker
Extract CSV to Postgres, transform with Pandas, orchestrate with Airflow — fully Dockerized

stars forks
📈 Crypto Data Pipeline
Stream crypto data from API via Kafka with Airflow, store in MySQL, visualize with Metabase

stars forks

→ View all repositories


📝 Top Articles on Medium

End-to-End Data Engineering
Spark · Kafka · Airflow · Docker · Cassandra · Python
☁️ AWS Cloud Data Engineering
Kinesis · S3 · Glue · Redshift · Spark · Athena
Data Streaming
Airflow · Kafka · MongoDB · Docker · SQL

→ View all articles on Medium


🛠 Tech Stack

💻 Languages


Python

SQL

⚙️ Data Engineering & Processing


Spark

Kafka

Flink

Hadoop

Hive

🧩 Orchestration & Workflow


Airflow

AWS MWAA

GCP Cloud Composer

🗄 Data Warehousing & Databases


Snowflake

Teradata

GCP BigQuery

AWS RDS & Aurora

AWS Redshift

GCP Cloud SQL

PostgreSQL

MySQL

🔄 ELT & Data Integration


DBT

Rudderstack

Fivetran

Airbyte

Hevo

☁️ Cloud & Infrastructure


AWS

GCP

Terraform

Docker

Kubernetes

📊 BI & Analytics


Looker

Tableau

Metabase

AWS Quicksight

Holistics

🤖 ML & AI Engineering


Hugging Face

Streamlit

MLflow

FastAPI

LangChain

Langfuse

Qdrant

☁️ Other Cloud Services


AWS Step Functions

AWS Glue

AWS Lambda

AWS Kinesis

AWS EC2

AWS S3

AWS Athena

GCP Datastream

GCP Cloud Build

GCP Cloud Storage

GCP Source Repository

GCP Cloud Scheduler

🧑‍💻 Dev & Collaboration


Git

Jenkins

CircleCI

Popular repositories Loading

  1. kafka_spark_structured_streaming kafka_spark_structured_streaming Public

    Get data from API, run a scheduled script with Airflow, send data to Kafka and consume with Spark, then write to Cassandra

    Python 146 62

  2. streaming_data_processing streaming_data_processing Public

    Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO

    Python 65 26

  3. airflow_kafka_cassandra_mongodb airflow_kafka_cassandra_mongodb Public

    Produce Kafka messages, consume them and upload into Cassandra, MongoDB.

    Python 43 25

  4. csv_extract_airflow_docker csv_extract_airflow_docker Public

    Writes the CSV file to Postgres, read table and modify it. Write more tables to Postgres with Airflow.

    Python 37 15

  5. crypto_api_kafka_airflow_streaming crypto_api_kafka_airflow_streaming Public

    Get Crypto data from API, stream it to Kafka with Airflow. Write data to MySQL and visualize with Metabase

    Python 17 5

  6. aws_end_to_end_streaming_pipeline aws_end_to_end_streaming_pipeline Public

    An AWS Data Engineering End-to-End Project (Glue, Lambda, Kinesis, Redshift, QuickSight, Athena, EC2, S3)

    Python 16 9