Getting StartedTutorialOverview

Tutorial: Build a Space Launch Analytics Platform

Welcome to the RAT tutorial! Over the next 8 parts, you will build a complete data platform that ingests, transforms, validates, and serves space launch data — covering every feature RAT has to offer.

What you will build:

  • Bronze pipelines that ingest CSV data from landing zones
  • Silver pipelines that join and enrich data using ref()
  • Gold pipelines that produce analytics-ready aggregations
  • Quality tests that gate bad data before it reaches production
  • Python pipelines that use DuckDB + PyArrow for complex logic
  • Triggers that automate the entire pipeline chain
  • Versioning and data retention for production governance

By the end, you will have a fully automated pipeline DAG that looks like this:


Prerequisites

RequirementMinimum VersionCheck Command
Docker24.0+docker --version
Docker Compose2.20+ (V2 plugin)docker compose version
Makeanymake --version
RAM4 GB free

If you have not installed RAT yet, follow the Installation guide first. It takes under 5 minutes.


Tutorial Parts

Each part builds on the previous one. We recommend following them in order.


Data Theme: Space Launches

Throughout this tutorial, you will work with real-world space launch data. The dataset includes 25 missions launched between 2023–2024 and 11 launch vehicles from agencies around the world — SpaceX, ESA, ISRO, JAXA, and Roscosmos.

The CSV files are included in the repository at docs/data/:

FileRowsDescription
space_launches.csv25Missions with dates, vehicles, outcomes, orbits, payload masses
launch_vehicles.csv11Rockets with specs (height, thrust, stages, manufacturer)

This data is small enough to run instantly, yet rich enough to demonstrate joins, incremental loading, quality validation, and aggregation patterns.


Ready? Let’s build something.