Core Concepts
This page explains the foundational ideas behind RAT. Understanding these concepts will help you design better pipelines and make the most of the platform.
How Everything Connects
Before diving into each concept, here is a visual overview of how the main concepts relate to each other:
Namespaces
A namespace is a logical grouping for your pipelines and tables — similar to a database schema in traditional databases. Namespaces help you organize your data platform when you have multiple projects, teams, or domains.
| Example | Use case |
|---|---|
default | General-purpose, starter namespace |
marketing | Marketing team’s pipelines and tables |
finance | Financial data transformations |
ecommerce | E-commerce product and order data |
Every pipeline and table lives inside a namespace. The default namespace is created automatically when you first start RAT.
SELECT * FROM "marketing"."silver"."campaign_metrics"In the Community Edition, namespaces are purely organizational. In Pro Edition, namespaces can have different access controls and ownership policies.
Layers
RAT uses the medallion architecture to organize data into three quality tiers. Every pipeline belongs to exactly one layer.
Bronze — Raw Data
The first landing point for data. Bronze pipelines ingest data from external sources (file uploads, APIs, databases) with minimal transformation. The goal is to capture a faithful copy of the source data.
Typical Bronze pipelines:
- Ingest CSV/Parquet files from a landing zone
- Read from an external API and store the raw JSON
- Mirror a source database table
SELECT *
FROM {{ landing_zone('order_uploads') }}Silver — Cleaned & Conformed
Silver pipelines read from Bronze tables (using ref()) and apply cleaning, deduplication,
type casting, and standardization. This is where you enforce data quality and make data
consistent.
Typical Silver pipelines:
- Deduplicate records by primary key
- Cast string dates to proper timestamps
- Join reference data (countries, currencies)
- Filter out invalid or test records
-- @merge_strategy: incremental
-- @unique_key: order_id
-- @watermark_column: updated_at
SELECT
order_id,
TRIM(customer_name) AS customer_name,
CAST(amount AS DECIMAL(10, 2)) AS amount,
CAST(created_at AS TIMESTAMP) AS created_at,
current_timestamp AS updated_at
FROM {{ ref('bronze.raw_orders') }}
{% if is_incremental() %}
WHERE created_at > '{{ watermark_value }}'
{% endif %}Gold — Business-Ready
Gold pipelines produce the final tables that business users, dashboards, and reports consume. They aggregate, summarize, and combine Silver tables into purpose-built datasets.
Typical Gold pipelines:
- Daily revenue summaries
- Customer lifetime value calculations
- KPI aggregations for dashboards
-- @merge_strategy: delete_insert
-- @unique_key: date
SELECT
DATE_TRUNC('day', created_at) AS date,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM {{ ref('silver.clean_orders') }}
GROUP BY 1The medallion architecture is a convention, not a hard constraint. RAT enforces the three layer names (bronze, silver, gold) but does not prevent you from referencing across layers in any direction. That said, the Bronze → Silver → Gold flow is strongly recommended as a best practice.
Pipelines
A pipeline is a SQL or Python program that transforms data. It is the core building block of RAT.
Addressing
Every pipeline is uniquely identified by three parts:
namespace.layer.nameFor example: default.silver.clean_orders
Types
| Type | Language | Use case |
|---|---|---|
| SQL | DuckDB SQL + Jinja | Most data transformations — SELECT statements |
| Python | Python 3.12 | Complex logic, API calls, ML models, custom transforms |
Jinja Templating
SQL pipelines support Jinja templating with several built-in functions:
| Function | Description | Example |
|---|---|---|
ref('layer.name') | Reference another table | {{ ref('bronze.raw_orders') }} |
ref('ns.layer.name') | Cross-namespace reference | {{ ref('finance.silver.invoices') }} |
landing_zone('name') | Read from a landing zone | {{ landing_zone('csv_uploads') }} |
this | Current pipeline’s output table | {{ this }} (used in quality tests) |
is_incremental() | True if table already exists | {% if is_incremental() %} |
watermark_value | Last processed watermark | '{{ watermark_value }}' |
Versioning
Pipelines have a draft and published state:
- Draft — Your current working copy. Saved with
Ctrl+Sbut not yet active. - Published — A numbered snapshot that runs when triggered. Each publish increments the version number.
You can roll back to any previous published version from the Overview tab.
Runs
A run is a single execution of a pipeline. When you click “Run” or a trigger fires, RAT creates a new run.
Run lifecycle
| Status | Meaning |
|---|---|
| pending | Queued, waiting for a runner slot |
| running | Pipeline SQL/Python is executing |
| success | Completed, quality tests passed, data merged to main catalog |
| failed | An error occurred or a quality test with error severity failed |
| cancelled | Manually cancelled by the user |
Branch isolation
Every run gets its own Nessie branch (git-like isolation for Iceberg tables). The pipeline writes data to this branch. Only after all quality tests pass does the branch merge into the main catalog. This means:
- Failed runs never corrupt production data — the branch is simply discarded
- Concurrent runs are safe — each run writes to its own isolated branch
- You can inspect failed data — the branch exists until the reaper cleans it up
Tables
Tables in RAT are Apache Iceberg tables stored in MinIO (S3-compatible object storage).
What is Iceberg?
Apache Iceberg is an open table format for large-scale data. It stores data as Parquet files organized with a metadata catalog. Key benefits:
- Time travel — Query previous versions of a table
- Schema evolution — Add/rename/drop columns without rewriting data
- Partition evolution — Change partitioning without rewriting data
- Open format — Data is not locked into any vendor
Table addressing
Tables use the same three-part naming as pipelines:
"namespace"."layer"."table_name"RAT automatically creates and manages Iceberg tables. When a pipeline runs successfully, its output becomes (or updates) the corresponding table.
Merge Strategies
The merge strategy controls how a pipeline’s output is combined with the existing data in the target table. You set the strategy in pipeline settings or as a SQL comment directive.
| Strategy | Behavior | When to use |
|---|---|---|
| full_refresh | Drop and recreate the table every run | Small tables, complete recalculations |
| incremental | Upsert new/changed rows based on a unique key and watermark | Large tables with ongoing updates |
| append_only | Insert all output rows without deduplication | Event logs, immutable data streams |
| delete_insert | Delete matching rows by key, then insert new ones | Periodic full-slice replacements |
| scd2 | Slowly Changing Dimension Type 2 — track historical changes with valid_from/valid_to timestamps | Dimension tables where you need history |
| snapshot | Full table snapshot with a snapshot timestamp column | Periodic full snapshots for auditing |
Setting a merge strategy in SQL:
-- @merge_strategy: incremental
-- @unique_key: order_id
-- @watermark_column: updated_at
SELECT * FROM {{ ref('bronze.raw_orders') }}
{% if is_incremental() %}
WHERE updated_at > '{{ watermark_value }}'
{% endif %}The incremental and delete_insert strategies require a unique_key. The
incremental strategy also requires a watermark_column to know which rows are new.
Missing these will cause the pipeline to fail.
Quality Tests
Quality tests are SQL queries that validate pipeline output before it merges into production. A quality test passes when it returns zero rows — any rows returned indicate a problem.
Severity levels
| Severity | On failure |
|---|---|
| error | Run fails, data is NOT merged. The Nessie branch is discarded. |
| warn | Run succeeds, data IS merged. A warning is logged. |
Examples
-- Ensure every order has an ID
SELECT * FROM {{ this }}
WHERE order_id IS NULL-- Flag suspiciously large orders (but don't block)
SELECT * FROM {{ this }}
WHERE amount > 1000000Quality tests run inside the isolated Nessie branch, on the data that was just
written. The {{ this }} template variable refers to the pipeline’s output table on
that branch. If an error-severity test fails, the branch is not merged — production
data stays clean.
Triggers
Triggers define what causes a pipeline to run. A pipeline can have multiple triggers.
| Trigger type | Description | Example |
|---|---|---|
cron | Run on a time-based schedule | 0 */6 * * * (every 6 hours) |
pipeline_success | Run when another pipeline succeeds | Run Silver after Bronze completes |
landing_zone_upload | Run when files are uploaded to a landing zone | Ingest new CSV files automatically |
webhook | Run when an external HTTP webhook is received | Triggered by an external system |
file_pattern | Run when files matching a pattern appear | orders_*.csv |
cron_dependency | Cron schedule that also waits for upstream pipelines | Scheduled but dependency-aware |
Cron expressions
RAT uses standard 5-field cron expressions:
┌─── minute (0-59)
│ ┌─── hour (0-23)
│ │ ┌─── day of month (1-31)
│ │ │ ┌─── month (1-12)
│ │ │ │ ┌─── day of week (0-6, Sunday=0)
│ │ │ │ │
* * * * *Common examples:
| Expression | Meaning |
|---|---|
0 * * * * | Every hour, on the hour |
0 6 * * * | Daily at 6:00 AM |
0 0 * * 1 | Every Monday at midnight |
*/15 * * * * | Every 15 minutes |
0 6,18 * * * | Twice daily at 6 AM and 6 PM |
Versioning
Pipeline versioning in RAT works like a simplified version control system:
- Edit — Make changes to your pipeline code (autosaved as a draft)
- Publish — Create a versioned snapshot (v1, v2, v3, …)
- Run — Runs always execute the latest published version
- Rollback — Restore any previous published version as the active one
Only published versions can be executed. If you save a draft but do not publish, runs will continue using the last published version.
Landing Zones
A landing zone is a file upload area that serves as the entry point for external data into RAT.
How it works
- Create a landing zone in the Portal (give it a name)
- Upload files (CSV, Parquet, JSON, etc.) via drag-and-drop
- Files are stored in MinIO (S3)
- A Bronze pipeline reads the files using
{{ landing_zone('zone_name') }} - Optionally, set a
landing_zone_uploadtrigger to run the pipeline automatically when new files arrive
-- Reads all files from the 'order_uploads' landing zone
SELECT *
FROM {{ landing_zone('order_uploads') }}Landing zones support file previews — after uploading, you can inspect the data before running a pipeline. This is useful for verifying file format and column names.
Lineage
Lineage is the dependency graph that shows how data flows through your platform.
RAT builds it automatically by parsing ref() and landing_zone() calls in your
pipeline code.
What lineage shows
- Which tables a pipeline reads from (upstream dependencies)
- Which table a pipeline writes to (downstream output)
- The full chain from raw ingestion (Bronze) through to business-ready (Gold)
Why it matters
- Impact analysis — Before changing a Silver pipeline, see which Gold tables depend on it
- Root cause analysis — When a Gold table has bad data, trace it back to the source
- Documentation — The DAG is always up-to-date because it is generated from code
- Scheduling — The
pipeline_successtrigger uses lineage to chain dependent pipelines
In this example, gold.daily_revenue depends on both silver.clean_orders and
silver.clean_products, which in turn depend on their respective Bronze tables and
landing zones. Changing the schema of bronze.raw_orders would visibly impact all
downstream consumers.
Concept Summary
| Concept | What it is | Key thing to remember |
|---|---|---|
| Namespace | Logical grouping | Like a database schema |
| Layer | Quality tier (Bronze/Silver/Gold) | Data gets cleaner as it flows up |
| Pipeline | SQL or Python transformation | Addressed as namespace.layer.name |
| Run | Single pipeline execution | Isolated on its own Nessie branch |
| Table | Apache Iceberg table in MinIO | Open format, time-travel capable |
| Merge Strategy | How output merges into the table | 6 strategies for different patterns |
| Quality Test | SQL validation on pipeline output | Zero rows returned = pass |
| Trigger | What starts a pipeline run | 6 types (cron, event, upload, etc.) |
| Versioning | Published snapshots of pipeline code | Rollback to any previous version |
| Landing Zone | File upload area for ingestion | Entry point for external data |
| Lineage | Dependency DAG between pipelines | Built automatically from ref() calls |