Skip to content
Data Engineering

Data Engineering

ETL, Data Warehousing, Airflow, dbt, Spark, Data Lakes, Stream Processing & more

15 Published 15 total topics

Build data pipelines that scale — ETL, data warehousing, Apache Airflow, dbt, Spark, data lakes, stream processing, and data modeling.

Available Tutorials

Learning Path

    flowchart LR
  A[Data Engineering Overview] --> B[ETL Pipelines]
  B --> C[Data Pipelines]
  C --> D[Apache Spark]
  C --> E[Data Warehousing]
  C --> F[Data Lakes]
  D --> G[Apache Airflow]
  F --> H[Data Lakehouse]
  E --> I[Data Modeling]
  I --> J[Advanced Data Modeling]
  G --> K[Data Pipeline Orchestration]
  

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Published Topics

Data Engineering Overview — Complete Guide to Pipelines and Architecture

Learn data engineering fundamentals: pipeline architecture, batch vs streaming, modern data stack components, and how data engineers build production data systems.

✓ Live

ETL Pipelines Explained — Extract, Transform, Load with Python Examples

Learn ETL pipeline fundamentals: Extract from sources, Transform data, Load to warehouses. Includes Python examples, Airflow integration, and batch vs streaming.

✓ Live

Data Warehousing Explained — Star Schema, Snowflake, and Cloud Warehouses

Learn data warehousing fundamentals: star schema, snowflake schema, fact vs dimension tables, OLAP vs OLTP, and how cloud warehouses like Snowflake work.

✓ Live

Data Lakes Explained — Lakehouse Architecture and Schema-on-Read

Learn data lakes fundamentals: raw storage on S3/ADLS, lakehouse architecture with Delta Lake, schema-on-read vs schema-on-write, and data lake vs warehouse trade-offs.

✓ Live

Apache Airflow Guide — DAGs, Operators, and ETL Orchestration

Learn Apache Airflow: DAGs, operators, tasks, schedulers. Build a complete ETL DAG with PythonOperator and BashOperator with runnable examples.

✓ Live

dbt Explained — SQL-First Data Transformations with dbt Core

Learn dbt (data build tool): SQL-first transformations, models, tests, sources, materializations. Build a complete dbt model for transforming raw orders with examples.

✓ Live

Apache Spark Guide — RDDs, DataFrames, and PySpark Examples

Learn Apache Spark: RDDs vs DataFrames, lazy evaluation, transformations vs actions, Spark SQL. Complete PySpark examples with aggregation and expected output.

✓ Live

Stream Processing Guide — Kafka, Flink, and Real-Time Data Pipelines

Learn stream processing fundamentals: Kafka Streams, Apache Flink, Spark Streaming, event time vs processing time, windowing, and exactly-once semantics with examples.

✓ Live

Building Data Pipelines — End-to-End Design, Monitoring, and Orchestration

Learn to build production data pipelines: end-to-end design, monitoring and alerting, data quality checks, schema evolution, orchestration with Airflow and Dagster.

✓ Live

Data Modeling Guide — Kimball, Inmon, Star Schema, and Slowly Changing Dimensions

Learn data modeling: Kimball vs Inmon, normalization vs denormalization, star schema, slowly changing dimensions (SCD Type 1/2/3), and designing a customer dimension.

✓ Live

Data Lakehouse Architecture — Delta Lake, Iceberg, and Hudi Explained

Learn data lakehouse architecture: Delta Lake, Apache Iceberg, Apache Hudi, ACID on data lakes, schema evolution, time travel, and merge/upsert operations.

✓ Live

Data Pipeline Orchestration — Airflow, Prefect, and Dagster Guide

Learn data pipeline orchestration: Apache Airflow DAGs, operators, sensors, Prefect, Dagster, CI/CD for data pipelines, data quality checks, monitoring, and alerting.

✓ Live

Advanced Data Modeling — Kimball vs Inmon, SCD Types, Fact Tables

Learn advanced data modeling: Kimball vs Inmon methodologies, slowly changing dimensions (SCD Type 1/2/3), fact tables, conformed dimensions, bridge tables, and junk dimensions.

✓ Live

Data Quality & Testing — Great Expectations, dbt Tests & Automated Validation

Master data quality testing: Great Expectations expectations, dbt generic and singular tests, data profiling, DQ dimensions, automated validation pipelines, and production monitoring.

✓ Live

Real-Time Data Pipelines — Kafka Streams, KSQL, Flink SQL & CDC

Build real-time data pipelines: Kafka Streams DSL, KSQL for streaming SQL, Flink SQL for analytics, Debezium CDC, streaming ingestion patterns, and Lambda vs Kappa architecture comparison.

✓ Live

All 15 topics in Data Engineering are published.