Kelash
Open to work · Junior Data Engineer · Analytics Engineer · Remote / On-site
Recruiter Snapshot
CS Graduate · Sukkur IBA · May 2026
3 Production Data Pipelines · GitHub + Cloud
Airflow · dbt · PostgreSQL · Kafka
Dockerized · CI/CD · Oracle Cloud ARM
Open to Internship & Junior DE Roles
Current Focus
Junior Data Engineer Roles
Data Engineering Internships
Airflow & dbt Projects
Real-Time Data Platforms
Remote or On-site · Pakistan

Kelash
Kumar.

Data Engineer Sukkur · Pakistan  ·  BS Computer Science
Sukkur IBA University  ·  Class of 2026

I build streaming pipelines deployed on real infrastructure with Airflow, dbt, and Kafka. Three end-to-end systems on GitHub.

What I can do for your team

§ 00 / Value
Automate Manual Work

Replace manual data pulls and spreadsheet hand-offs with scheduled, tested pipelines that run on their own. Your analysts get fresh data every morning without anyone touching it.

Data You Can Trust

Pipelines that validate data at every layer and halt loudly when something breaks, never silently corrupting the warehouse. Fewer firefighting incidents, more confidence in dashboards.

Faster Decisions

Replace hourly batch jobs with sub-10-second replication. Business teams see changes as they happen: inventory updates, user events, financial records. Not what happened yesterday.

About & record

§ 01 / Background
Profile 00
2026

CS graduate, May 2026. Three production data systems deployed on Oracle Cloud: streaming, CDC, and medallion lakehouse. Stack: Python, Airflow, dbt, Kafka, PostgreSQL, Docker.

Education
BS Computer Science
Sukkur IBA · 2026
Projects
3 Deployed Pipelines
GitHub + Oracle Cloud
Target Roles
Open to Work
Remote · On-site PK
Education 01
2022–2026
BS Computer Science
Sukkur IBA University  ·  Pakistan
Graduated
May 2026
Relevant Coursework
Database Systems Data Structures & Algorithms Operating Systems Computer Networks Software Engineering System Design
Certifications 03
2026 ● Active
Click to view
IBM Professional

IBM Data Engineering

16-course professional certificate covering SQL, Python, NoSQL, Big Data, and cloud data platforms.

Coursera Verify
2026 ● Active
Click to view
Snowflake Professional

Snowflake Data Engineering

3-course program covering data ingestion, transformation, pipeline orchestration, DevOps, and observability on Snowflake.

Coursera Verify
2025 ● Active
Click to view
Google Professional

Google Data Analytics

9-course program covering data cleaning, SQL, R, visualization, and analytical workflow.

Coursera Verify
Kelash Kumar · Data Engineer · 2026
Resume

One-page overview of my stack, three production projects, education, and certifications.

View PDF Download
Résumé preview
resume.png
View PDF
View PDF
NSCT · Pakistan · 2026
92.2nd %ile
National Skill Competency Test

Top marks in Programming, Database, and AI/ML & Data Analytics across a 10-subject national competency exam.

Programming Database AI/ML & Data
NSCT Result
View Result
View Result

Selected work

§ 02 / Selected work
Pipeline 01 Live Streaming

Real-Time Crypto
Streaming Pipeline

Problem: analytics teams need near-real-time crypto price data without manual polling. This 7-stage pipeline ingests 10 assets from CoinGecko every 30 seconds, streams through Redpanda (Kafka-compatible), persists to PostgreSQL, transforms via dbt (14 tests), and surfaces in a live Metabase dashboard, orchestrated by Airflow and deployed to Oracle Cloud ARM via 90-second GitHub Actions CI/CD.

Outcome 14,400 records ingested daily with <5 min freshness. Zero manual intervention. 9 containers running on a free-tier VM, live at streaming-pipeline.kelash.me.
PythonRedpandaPostgreSQLdbtAirflowMetabaseDockerOracle Cloud
14.4K
Records/day
<5 min
Freshness
~90 s
Deploy
9 svc
Containers
Pipeline 02 CDC Log-Based

Change-Data-Capture
Pipeline

Problem: batch replication introduces hourly lag and can't capture row-level deletes. This CDC pipeline uses Debezium to capture WAL events from PostgreSQL, routes them through Redpanda, and upserts idempotently to a target database in under 10 seconds. Bad records route to a dead-letter queue; dbt builds SCD Type 2 history for full audit trails.

Outcome Reduced replication lag from 60 min to <10 sec. Full delete/update history preserved via SCD Type 2. Zero data loss on bad records via dead-letter queue.
PostgreSQLDebeziumRedpandaPythondbtAirflowDocker
<10 s
Replication lag
9
dbt tests
3
dbt models
DLQ
Error handling
Pipeline 03 Lakehouse Quality Gates

Medallion Data
Lakehouse

Problem: raw API data lands dirty. Duplicates, schema drift, nulls in critical fields. This lakehouse ingests weather data from Open-Meteo into MinIO (Bronze), validates and cleans it with Great Expectations quality gates (Silver), and aggregates city-level and regional weekly summaries in DuckDB (Gold). If any layer fails validation, the pipeline halts. The warehouse never silently corrupts.

Outcome 31 Great Expectations checks across 3 layers. Idempotent daily runs with safe backfill. Data quality enforced at pipeline level, not discovered after the fact in a dashboard.
MinIODuckDBOpen-MeteoGreat ExpectationsAirflowParquetDocker
3
Medallion layers
12+
GE expectations
Daily
Schedule
Idempotent
Backfills

Technical stack

§ 03 / Technical stack
Languages & Foundations 04
Python SQL Bash Java
Data Engineering & Streaming 07
Apache Airflow dbt Kafka Redpanda Debezium Pandas Great Expectations
Storage & Databases 05
PostgreSQL DuckDB MinIO Apache Parquet MongoDB
DevOps & Cloud 05
Docker Linux GitHub Actions Oracle Cloud pytest
Visualization & BI 03
Metabase Grafana Plotly
APIs & Services 02
FastAPI Caddy
§ 04 / Get in touch

Hiring a junior data engineer?

I'm actively looking for junior data engineering, analytics engineering, or data platform roles: remote, hybrid, or on-site in Pakistan. I bring 3 deployed projects, strong Python/SQL foundations, and hands-on experience with the stack most junior data engineering job descriptions ask for. The fastest way to reach me is email.