Tutorial: Build a Space Launch Analytics Platform
Welcome to the RAT tutorial! Over the next 8 parts, you will build a complete data platform that ingests, transforms, validates, and serves space launch data — covering every feature RAT has to offer.
What you will build:
- Bronze pipelines that ingest CSV data from landing zones
- Silver pipelines that join and enrich data using
ref() - Gold pipelines that produce analytics-ready aggregations
- Quality tests that gate bad data before it reaches production
- Python pipelines that use DuckDB + PyArrow for complex logic
- Triggers that automate the entire pipeline chain
- Versioning and data retention for production governance
By the end, you will have a fully automated pipeline DAG that looks like this:
Prerequisites
| Requirement | Minimum Version | Check Command |
|---|---|---|
| Docker | 24.0+ | docker --version |
| Docker Compose | 2.20+ (V2 plugin) | docker compose version |
| Make | any | make --version |
| RAM | 4 GB free | — |
If you have not installed RAT yet, follow the Installation guide first. It takes under 5 minutes.
Tutorial Parts
Each part builds on the previous one. We recommend following them in order.
Create a SQL pipeline, preview it, publish it, run it, and query the results. ~15 min
Upload real CSV data through landing zones and ingest it into Bronze pipelines. ~15 min
Create Silver pipelines that join Bronze tables using ref(). See the lineage DAG. ~10 min
Make pipelines incremental. Understand all 6 merge strategies. ~15 min
Gate pipeline merges with SQL quality tests. Catch bad data before it lands. ~10 min
Write Python pipelines using DuckDB and PyArrow for complex transformations. ~10 min
Automate pipeline execution with cron, webhooks, and event-driven triggers. ~15 min
Add a Gold layer, learn versioning and data retention. The full picture. ~15 min
Data Theme: Space Launches
Throughout this tutorial, you will work with real-world space launch data. The dataset includes 25 missions launched between 2023–2024 and 11 launch vehicles from agencies around the world — SpaceX, ESA, ISRO, JAXA, and Roscosmos.
The CSV files are included in the repository at docs/data/:
| File | Rows | Description |
|---|---|---|
space_launches.csv | 25 | Missions with dates, vehicles, outcomes, orbits, payload masses |
launch_vehicles.csv | 11 | Rockets with specs (height, thrust, stages, manufacturer) |
This data is small enough to run instantly, yet rich enough to demonstrate joins, incremental loading, quality validation, and aggregation patterns.
Ready? Let’s build something.