ReferenceAnnotations

Annotations

Annotations are metadata directives embedded in SQL or Python comments at the top of pipeline and quality test files. They configure how RAT processes the pipeline output — merge strategy, deduplication keys, watermark tracking, retry behavior, and more.


Syntax

Annotations use the @key: value format inside comments:

SQL pipeline (-- prefix)
-- @merge_strategy: incremental
-- @unique_key: order_id
-- @watermark_column: updated_at
-- @description: Deduplicated orders from the raw feed
Python pipeline (# prefix)
# @merge_strategy: full_refresh
# @description: Fetch and materialize exchange rates from API
# @max_retries: 3
# @retry_delay_seconds: 30

Parsing Rules

  1. Comment prefix: SQL files use --, Python files use #.
  2. Top of file: Annotations must appear at the very top of the file, before any code.
  3. First non-comment line stops parsing: Once the parser encounters a line that is not a comment or is blank after annotations, it stops looking for more annotations.
  4. One per line: Each annotation must be on its own line.
  5. Whitespace is trimmed: Leading/trailing spaces around values are stripped.
  6. Keys are case-insensitive: @Merge_Strategy and @merge_strategy are equivalent (normalized to lowercase internally).
  7. Order does not matter: Annotations can appear in any order, though by convention they follow the order listed in this reference.

What Stops Parsing

pipeline.sql
-- @merge_strategy: incremental    ← PARSED
-- @unique_key: order_id           ← PARSED
--                                 ← skipped (blank comment)
-- @watermark_column: updated_at   ← PARSED (blank comments don't stop parsing)
 
SELECT * FROM {{ ref('bronze.data') }}   ← parsing stops here (non-comment line)
 
-- @description: too late          ← NOT PARSED (after SQL)
⚠️

A common mistake is placing a regular SQL comment between annotations and code. Regular comments (without @) stop the annotation parser:

-- @merge_strategy: incremental
-- @unique_key: order_id
-- This pipeline cleans raw data     ← stops parsing (no @key pattern)
-- @watermark_column: updated_at     ← NOT PARSED
 
SELECT ...

To avoid this, place all annotations together in an uninterrupted block, and put regular comments after them.


Pipeline Annotations

These annotations configure how a pipeline processes and writes data.

@description

Human-readable description of what the pipeline does. Displayed in the portal’s pipeline list and detail views.

PropertyValue
Typestring
Default"" (empty)
RequiredNo
-- @description: Daily revenue aggregation by currency and region

@materialized

Controls how the pipeline output is materialized. Currently only table is supported.

PropertyValue
Typeenum: table, view
Defaulttable
RequiredNo
-- @materialized: table

View materialization (@materialized: view) is planned for a future release. Currently all pipelines produce Iceberg tables.

@merge_strategy

Defines how new data is merged with existing data in the target table.

PropertyValue
Typeenum: full_refresh, incremental, append_only, delete_insert, scd2, snapshot
Defaultfull_refresh
RequiredNo
-- @merge_strategy: incremental

Strategy behaviors:

StrategyBehaviorRequires
full_refreshDrop and recreate the table on every runNothing
incrementalMerge new rows by unique key, using watermark to select only new data@unique_key, @watermark_column
append_onlyAppend all rows without deduplicationNothing
delete_insertDelete matching rows by unique key, then insert new rows@unique_key
scd2Slowly Changing Dimension Type 2 — track historical changes@unique_key
snapshotPartition-based snapshots — each run writes to a new partition@partition_column

@unique_key

The column(s) used as the primary key for deduplication. Required for incremental, delete_insert, and scd2 strategies.

PropertyValue
Typestring (comma-separated for composite keys)
Default— (none)
RequiredFor incremental, delete_insert, scd2
-- Single key
-- @unique_key: order_id
 
-- Composite key
-- @unique_key: customer_id, product_id, order_date

For composite keys, separate column names with commas. Whitespace around column names is trimmed.

@watermark_column

The column used to track incremental progress. RAT computes MAX(watermark_column) from the existing table to determine which rows are new.

PropertyValue
Typestring (column name)
Default— (none)
RequiredFor incremental
-- @watermark_column: updated_at

The watermark column should be a monotonically increasing value — typically a timestamp (updated_at, created_at) or a sequential integer (id, version).

@partition_column

The column used for partitioning in snapshot strategy. Each run writes a new partition value.

PropertyValue
Typestring (column name)
Default— (none)
RequiredFor snapshot
-- @merge_strategy: snapshot
-- @partition_column: snapshot_date

@scd_valid_from

The column name for the SCD2 record validity start date.

PropertyValue
Typestring (column name)
Defaultvalid_from
RequiredNo (only relevant for scd2)
-- @scd_valid_from: effective_date

@scd_valid_to

The column name for the SCD2 record validity end date.

PropertyValue
Typestring (column name)
Defaultvalid_to
RequiredNo (only relevant for scd2)
-- @scd_valid_to: expiry_date

@archive_landing_zones

When set to true, RAT moves landing zone files to a _processed/ subdirectory after a successful run. This prevents reprocessing the same files on the next run.

PropertyValue
Typeboolean (true or false)
Defaultfalse
RequiredNo
-- @archive_landing_zones: true
⚠️

Archiving is a move operation, not a copy. The original files are removed from the landing zone after the run succeeds. If the run fails, files are left in place.

@max_retries

Maximum number of retry attempts if the pipeline run fails. Applies to transient errors (network timeouts, temporary S3 unavailability).

PropertyValue
Typeinteger
Default0 (no retries)
RequiredNo
-- @max_retries: 3

@retry_delay_seconds

Delay in seconds between retry attempts. Uses a fixed delay (not exponential backoff).

PropertyValue
Typeinteger
Default0
RequiredNo
-- @retry_delay_seconds: 30

Pipeline Annotations Summary

AnnotationTypeDefaultRequired ForDescription
@descriptionstring""Human-readable description
@materializedenumtableMaterialization type
@merge_strategyenumfull_refreshData merge strategy
@unique_keystringincremental, delete_insert, scd2Dedup column(s), comma-separated
@watermark_columnstringincrementalIncremental progress tracking column
@partition_columnstringsnapshotSnapshot partition column
@scd_valid_fromstringvalid_fromSCD2 validity start column
@scd_valid_tostringvalid_toSCD2 validity end column
@archive_landing_zonesbooleanfalseMove landing files after success
@max_retriesinteger0Retry attempts on failure
@retry_delay_secondsinteger0Delay between retries

Quality Test Annotations

Quality tests are SQL files in the pipeline’s tests/quality/ directory. They use the same annotation syntax but with a different set of keys.

@severity

Determines what happens when the quality test fails.

PropertyValue
Typeenum: error, warn
Defaulterror
RequiredNo
-- @severity: error
  • error: The run fails and the ephemeral Nessie branch is deleted. Bad data never reaches production.
  • warn: The run succeeds with a warning. The branch is merged despite the quality issue. The warning is visible in the portal and run logs.

@description

Human-readable description of what the quality test checks. Displayed in the portal’s quality test results.

PropertyValue
Typestring
Default"" (empty)
RequiredNo
-- @description: Ensure no orders have negative total amounts

@tags

Comma-separated tags for organizing and filtering quality tests.

PropertyValue
Typestring (comma-separated)
Default"" (empty)
RequiredNo
-- @tags: finance, critical, orders

@remediation

Instructions for how to fix the issue when this quality test fails. Shown in the portal alongside the failure.

PropertyValue
Typestring
Default"" (empty)
RequiredNo
-- @remediation: Check the source system for orders with negative amounts. These are usually refunds that should have status='refunded'.

Quality Test Example

ecommerce/pipelines/silver/clean_orders/tests/quality/no_negative_amounts.sql
-- @severity: error
-- @description: Orders must not have negative total amounts
-- @tags: finance, data-integrity
-- @remediation: Negative amounts indicate refunds — filter them out or set status to 'refunded'
 
SELECT *
FROM {{ this }}
WHERE total_amount < 0

A quality test passes when it returns zero rows. If the query returns any rows, those rows represent violations and the test fails.

Quality Test Annotations Summary

AnnotationTypeDefaultDescription
@severityenumerrorerror = block merge, warn = allow with warning
@descriptionstring""What the test checks
@tagsstring""Comma-separated categorization tags
@remediationstring""How to fix failures

Annotations vs config.yaml

Pipeline configuration can be defined in two places:

  1. Annotations in the pipeline file (pipeline.sql or pipeline.py)
  2. config.yaml in the pipeline directory

When both exist, annotations take precedence. The resolution order is:

config.yaml Format

ecommerce/pipelines/silver/clean_orders/config.yaml
merge_strategy: incremental
unique_key: order_id
watermark_column: updated_at
description: Deduplicated orders with latest status
max_retries: 2
retry_delay_seconds: 15

When to Use Which

The configuration is tightly coupled to the SQL logic (merge strategy, watermark, unique key). Keeping it in the same file makes the pipeline self-documenting.

Use Annotations When

The configuration is operational (retries, descriptions) and you want to change it without modifying the SQL file. Useful for separating concerns.

Use config.yaml When
⚠️

Pick one approach per pipeline and stick with it. Mixing annotations and config.yaml for the same key is confusing — if config.yaml says full_refresh but the annotation says incremental, the annotation wins silently.


Annotation Validation

RAT validates annotations at pipeline registration time and again at the start of each run:

ValidationError
Unknown annotation keyWarning (logged, not blocking)
Missing @unique_key for incremental strategyError (run fails)
Missing @watermark_column for incremental strategyError (run fails)
Missing @partition_column for snapshot strategyError (run fails)
Invalid @merge_strategy valueError (run fails)
Invalid @severity value in quality testError (defaults to error)
@max_retries is not a positive integerError (run fails)
@retry_delay_seconds is not a positive integerError (run fails)