Kelash.
Available · Data engineering · On-site · Remote · Freelance

Kelash
Kumar.

Data Engineer Sukkur · Pakistan  ·  BS Computer Science
Sukkur IBA University  ·  Class of 2026

I ship streaming pipelines — production-grade, observable, and built to survive load.

Selected work

§ 01 / Pipelines
01
Live Streaming Production

Real-Time Crypto Streaming Pipeline

Python · Redpanda · PostgreSQL · dbt · Airflow · Metabase · Docker · Oracle Cloud

A six-stage streaming system pulling live crypto prices from CoinGecko through Redpanda, persisting them in PostgreSQL, transforming with dbt, orchestrating with Airflow, and surfacing them in a Metabase dashboard. Eight containers, end-to-end. Deployed to Oracle Cloud's free-tier ARM VM with CI/CD shipping changes in ninety seconds.

Architecture  /  6 stages · 8 containers End-to-end · dockerized
01 Source CoinGecko REST API · 10 crypto assets · 60s polling cadence
02 Ingest Redpanda event broker · Python producer publishes to crypto-prices · consumer commits safely with poison-pill handling
03 Store PostgreSQL raw warehouse · indexed for read · volume-persisted across container restarts
04 Transform dbt three-layer models · staging → intermediate → mart · LAG windows, moving averages, daily summaries
05 Orchestrate Apache Airflow · 5-minute schedule · freshness gates · automatic retries · task-level observability
06 Analyze Metabase six-panel live dashboard · price trends, deltas, top movers, volumes — refreshed in near real-time
14,400/day
Throughput
< 5 min
Freshness
~ 90 s
Deploy
8 svc
Containers
02
CDC Open Source

Change-Data-Capture Pipeline

PostgreSQL · Debezium · Redpanda · Python · dbt · Airflow · Docker

Log-based replication that turns hourly batches into sub-ten-second streams. Six containerized services capture WAL events from a source PostgreSQL via Debezium, stream them through Redpanda, and apply them idempotently to a target database. Bad records flow to a DLQ; dbt builds the analytical layer with full SCD Type 2 history.

Architecture  /  6 services · sub-10s lag Docker compose
01 Source Source PostgreSQL :5433 · wal_level=logical · production database
02 Capture Debezium :8083 · pgoutput plugin · streams CDC events from WAL
03 Stream Redpanda :9092 · zero-JVM Kafka API · message broker
04 Consume Python consumer · confluent-kafka · ON CONFLICT upserts to target :5434 · DLQ for bad records
05 Model dbt · 3 models · 9 tests · staging → marts · SCD Type 2 with valid_from / valid_to dimensions
06 Orchestrate Airflow :8085 · cdc_pipeline_monitor DAG · 30-min cadence · check_cdc_lag → snapshot → test → healthy
< 10 s
Replication lag
9 / 3
Tests / models
SCD T2
History
At-least-once
Semantics
03
Lakehouse Quality Gates

Medallion Data Lakehouse

MinIO · DuckDB · Open-Meteo · Great Expectations · Airflow · Parquet · Docker

A four-layer lakehouse on object storage. MinIO holds Bronze, Silver, and Gold buckets; DuckDB reads and transforms Parquet directly from S3-compatible storage via the httpfs extension; Great Expectations gates every layer transition — the pipeline halts when validation fails, so the warehouse never silently corrupts.

Architecture  /  4 layers · 5-task DAG medallion_weather
01 Lakehouse MinIO S3-compatible store · bronze/ raw · silver/ cleaned · gold/ curated marts
02 Compute DuckDB embedded with httpfs extension · reads Parquet from MinIO · transforms · writes Parquet back
03 Ingest Python script · Open-Meteo public API → raw Parquet in bronze/
04 Govern Great Expectations validation suites at Bronze and Silver · pipeline FAILS if expectations break
05 Refine DuckDB cleans & deduplicates Bronze → partitioned Parquet in silver/
06 Aggregate DuckDB rolls Silver → business marts in gold/city_weekly_summary, regional_daily, city_extremes
07 Orchestrate Airflow 2.9.1 LocalExecutor · DAG: bronze_ingest → bronze_gate → silver_transform → silver_gate → gold_transform
3 layers
Bronze · Silver · Gold
GE gates
Quality validation
Idempotent
Backfills
Parquet
Columnar storage

Technical stack

§ 02 / Tooling
Languages & Foundations 04
Python SQL Bash Java
Data Engineering & Streaming 07
Apache Airflow dbt Kafka Redpanda Debezium Pandas Great Expectations
Storage & Databases 05
PostgreSQL DuckDB MinIO Apache Parquet MongoDB
DevOps & Cloud 06
Docker Linux GitHub Actions Oracle Cloud pytest Caddy
Visualization & BI 05
Metabase Grafana Plotly Power BI Tableau

About & record

§ 03 / Background
Education 01
2022 — 2026
BS Computer Science
Sukkur IBA University · Pakistan

Database Systems · Data Structures & Algorithms · Operating Systems · Computer Networks · Software Engineering · System Design.

Certifications 03
2026
Google Cloud Data Engineer
Coursera
2025
IBM Data Engineering
Coursera
2025
Google Data Analytics
Coursera
Honors & Recognition 03
2026
National Skill Competency Test — 92.2 percentile
NSCT · Pakistan

Top marks in Programming, Database, and AI/ML & Data Analytics across a 10-subject national competency exam.

2022
Sindh Talent Hunt Program scholarship
STHP · Government of Sindh

Fully-funded merit scholarship covering BS Computer Science at Sukkur IBA University.

2025 — 26
Three production pipelines shipped
Personal · Open source

Streaming, CDC, and medallion lakehouse — each fully containerized and orchestrated, all deployed.

§ 04 / Get in touch

Let's build something good.

Open to data engineering roles, freelance work, and serious collaborations — remote, hybrid, or on-site in Pakistan. The fastest way to reach me is email.