The skills a data engineer should learn — in order. From Data Modeling and SQL, through OLAP systems, dbt, and data formats, to processing engines, orchestration, Kafka, and stream processing. A practical roadmap built from real experience.
With only $7/month, you get access to:
Tutorials give toy examples. Videos move too fast. Articles are forgotten two weeks later.
The difference: deep-dive articles with illustrations that make complex internals click, paired with CLI tools you actually run on your laptop.
Right on your laptop, without internet. Like playing a game: read, code, verify, move on.
Learn every crucial Apache Spark internal concept — from RDD basics and transformations to aggregations and joins.
A taste of what's inside — sharp, technical, no fluff.
The skills a data engineer should learn — in order. From Data Modeling and SQL, through OLAP systems, dbt, and data formats, to processing engines, orchestration, Kafka, and stream processing. A practical roadmap built from real experience.
RDD, architecture, execution modes, planning, scheduling, resource allocation, memory management, cache, and joins. In short: everything about Spark.
How does Parquet organize data? Why the hybrid format? How do read/write processes work? And how does it help with OLAP workloads?
In this article, I sat down and relearned Git. It's not only about some Git commands, but also about what happens under the hood.
A completely new user would be overwhelmed by the diversity of cloud services. If you're a data engineer already overwhelmed by everything to learn, entering the Cloud without prior experience would leave you 2x as overwhelmed. Here's a vendor-agnostic guide to start.
Data architecture 101 — warehouse, lake, lakehouse, data mesh. Plus clarifications on Medallion, data modeling, and the Modern Data Stack.
📚
200+ more articles
Deep dives on Spark, data formats, orchestration, cloud, Git, data modeling, and more.
Browse all →← swipe to browse →
One subscription. Every article, every tool, everything I build next.
billed annually
What's included
Already a subscriber? Activate your GitHub access here.