Data Engineering
ETL, Data Warehousing, Airflow, dbt, Spark, Data Lakes, Stream Processing & more
Build data pipelines that scale — ETL, data warehousing, Apache Airflow, dbt, Spark, data lakes, stream processing, and data modeling.
Available Tutorials
Learning Path
flowchart LR
A[Data Engineering Overview] --> B[ETL Pipelines]
B --> C[Data Pipelines]
C --> D[Apache Spark]
C --> E[Data Warehousing]
C --> F[Data Lakes]
D --> G[Apache Airflow]
F --> H[Data Lakehouse]
E --> I[Data Modeling]
I --> J[Advanced Data Modeling]
G --> K[Data Pipeline Orchestration]
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Published Topics
Data Engineering Overview — Complete Guide to Pipelines and Architecture
Learn data engineering fundamentals: pipeline architecture, batch vs streaming, modern data stack components, and how data engineers build production data systems.
✓ LiveETL Pipelines Explained — Extract, Transform, Load with Python Examples
Learn ETL pipeline fundamentals: Extract from sources, Transform data, Load to warehouses. Includes Python examples, Airflow integration, and batch vs streaming.
✓ LiveData Warehousing Explained — Star Schema, Snowflake, and Cloud Warehouses
Learn data warehousing fundamentals: star schema, snowflake schema, fact vs dimension tables, OLAP vs OLTP, and how cloud warehouses like Snowflake work.
✓ LiveData Lakes Explained — Lakehouse Architecture and Schema-on-Read
Learn data lakes fundamentals: raw storage on S3/ADLS, lakehouse architecture with Delta Lake, schema-on-read vs schema-on-write, and data lake vs warehouse trade-offs.
✓ LiveApache Airflow Guide — DAGs, Operators, and ETL Orchestration
Learn Apache Airflow: DAGs, operators, tasks, schedulers. Build a complete ETL DAG with PythonOperator and BashOperator with runnable examples.
✓ Livedbt Explained — SQL-First Data Transformations with dbt Core
Learn dbt (data build tool): SQL-first transformations, models, tests, sources, materializations. Build a complete dbt model for transforming raw orders with examples.
✓ LiveApache Spark Guide — RDDs, DataFrames, and PySpark Examples
Learn Apache Spark: RDDs vs DataFrames, lazy evaluation, transformations vs actions, Spark SQL. Complete PySpark examples with aggregation and expected output.
✓ LiveStream Processing Guide — Kafka, Flink, and Real-Time Data Pipelines
Learn stream processing fundamentals: Kafka Streams, Apache Flink, Spark Streaming, event time vs processing time, windowing, and exactly-once semantics with examples.
✓ LiveBuilding Data Pipelines — End-to-End Design, Monitoring, and Orchestration
Learn to build production data pipelines: end-to-end design, monitoring and alerting, data quality checks, schema evolution, orchestration with Airflow and Dagster.
✓ LiveData Modeling Guide — Kimball, Inmon, Star Schema, and Slowly Changing Dimensions
Learn data modeling: Kimball vs Inmon, normalization vs denormalization, star schema, slowly changing dimensions (SCD Type 1/2/3), and designing a customer dimension.
✓ LiveData Lakehouse Architecture — Delta Lake, Iceberg, and Hudi Explained
Learn data lakehouse architecture: Delta Lake, Apache Iceberg, Apache Hudi, ACID on data lakes, schema evolution, time travel, and merge/upsert operations.
✓ LiveData Pipeline Orchestration — Airflow, Prefect, and Dagster Guide
Learn data pipeline orchestration: Apache Airflow DAGs, operators, sensors, Prefect, Dagster, CI/CD for data pipelines, data quality checks, monitoring, and alerting.
✓ LiveAdvanced Data Modeling — Kimball vs Inmon, SCD Types, Fact Tables
Learn advanced data modeling: Kimball vs Inmon methodologies, slowly changing dimensions (SCD Type 1/2/3), fact tables, conformed dimensions, bridge tables, and junk dimensions.
✓ LiveData Quality & Testing — Great Expectations, dbt Tests & Automated Validation
Master data quality testing: Great Expectations expectations, dbt generic and singular tests, data profiling, DQ dimensions, automated validation pipelines, and production monitoring.
✓ LiveReal-Time Data Pipelines — Kafka Streams, KSQL, Flink SQL & CDC
Build real-time data pipelines: Kafka Streams DSL, KSQL for streaming SQL, Flink SQL for analytics, Debezium CDC, streaming ingestion patterns, and Lambda vs Kappa architecture comparison.
✓ LiveAll 15 topics in Data Engineering are published.