Architecture Overview

RAT is a self-hostable data platform that runs as 8 containers orchestrated by Docker Compose. It follows a strict separation of concerns: a Go API server handles orchestration and state, Python services handle data processing, and a Next.js frontend provides the user interface.

Design Philosophy

RAT’s architecture is guided by five principles:

Single docker compose up --- The entire platform starts with one command. No external dependencies, no cloud accounts, no license keys for the Community Edition.
Separation of compute and state --- Data lives in S3 (MinIO) in open Apache Iceberg format. Metadata lives in Postgres. Compute (DuckDB) is ephemeral and stateless.
Git-like isolation --- Every pipeline run creates an isolated Nessie branch. Bad data never reaches the production catalog. Quality tests gate merges.
Plugin extensibility --- The Community Edition ships with no-op implementations for auth, sharing, and enforcement. Pro plugins slot in without changing the core platform.
No vendor lock-in --- All storage is open format (Iceberg + Parquet). The catalog is Nessie (open source). You can query your data with any tool that speaks Iceberg.

System Block Diagram

The following diagram shows all 8 containers (7 services + 1 init job), their communication protocols, network zones, and exposed ports.

The portal and ratd containers live on both the frontend and backend networks. The portal needs frontend access (user-facing port 3000) and backend access (to reach ratd internally for server-side rendering). All other services live exclusively on the backend network and are not reachable from outside Docker.

Containers at a Glance

Container	Language	Role	Exposed Ports	Network
portal	Next.js (TypeScript)	Web IDE --- the only user interface	`3000` (HTTP)	frontend + backend
ratd	Go	API server, scheduler, plugin host, catalog ops	`8080` (REST), `8081` (gRPC, internal)	frontend + backend
ratq	Python	Read-only DuckDB query service	`50051` (gRPC, internal)	backend
runner	Python	Pipeline execution engine	`50052` (gRPC, internal)	backend
postgres	PostgreSQL 16.4	Platform state (16 tables)	`5432` (localhost only)	backend
minio	MinIO	S3-compatible object storage	`9000` (S3 API), `9001` (Console)	backend
nessie	Java (Quarkus)	Iceberg REST catalog with git-like branching	`19120` (REST, localhost only)	backend
minio-init	MinIO Client (mc)	One-shot: creates bucket, enables versioning	--- (exits after setup)	backend

Communication Protocols

Protocol Choices

Path	Protocol	Why
Portal to ratd	REST (HTTP)	Browser-native. SWR data fetching. No gRPC-Web complexity.
ratd to runner/ratq	ConnectRPC (gRPC)	Type-safe, streaming support (SSE logs), HTTP/1.1 compatible for easier debugging.
ratd to Postgres	SQL via pgx	Pure Go driver. Connection pooling. Type-safe queries via sqlc.
ratd to MinIO	S3 API	Standard S3 protocol via MinIO Go SDK. Swappable with any S3-compatible store.
ratd to Nessie	Iceberg REST API	Standard catalog protocol. Not Nessie-specific --- works with any Iceberg REST catalog.
runner to ratd	HTTP callback	Push-based status reporting. Runner POSTs terminal status on completion with 60s poll fallback.

Network Zones

RAT uses two Docker networks to enforce network segmentation:

Frontend Network (`rat_frontend`)

The user-facing network. Only two containers are attached:

portal --- serves the web IDE on port 3000
ratd --- serves the REST API on port 8080

This is the only network accessible from the host machine (via published ports).

Backend Network (`infra_default`)

The internal network where all inter-service communication happens. All 8 containers are attached to this network, but only portal and ratd are also on the frontend network.

Services like runner, ratq, postgres, minio, and nessie are never directly accessible from the host (except via localhost-bound debug ports for postgres, minio, and nessie).

⚠️

In production, the localhost-bound debug ports for Postgres (:5432), MinIO (:9000, :9001), and Nessie (:19120) should be removed or firewalled. They exist for development convenience only.

Startup Sequence

Services start in dependency order enforced by Docker Compose health checks:

Infrastructure layer starts first

Postgres and MinIO start simultaneously. Both have health checks (pg_isready and mc ready). Nothing else starts until both are healthy.

minio-init runs (one-shot)

Once MinIO is healthy, the minio-init container runs. It creates the rat bucket, enables S3 versioning, and configures a 7-day lifecycle policy for non-current object versions. It exits after completion.

Nessie starts

Nessie depends on Postgres (it persists catalog metadata via JDBC). It starts once Postgres is healthy and exposes the Iceberg REST catalog on port 19120.

Python services start

runner and ratq start once MinIO and Nessie are healthy. They initialize their DuckDB engines with S3 and Iceberg extensions.

ratd starts

ratd depends on Postgres, MinIO, and Nessie. On startup it runs database migrations, initializes the scheduler, starts the reaper daemon, connects to runner and ratq via gRPC, and loads any configured plugins.

Portal starts last

portal depends on ratd being healthy. It needs the API to be available for both server-side rendering and client-side data fetching.

Resource Allocation

Every container has explicit memory and CPU limits to prevent runaway consumption:

Container	Memory Limit	CPU Limit	PIDs Limit	Notes
ratd	512 MB	1.0	100	Lightweight Go binary
ratq	1 GB	1.0	100	Single persistent DuckDB, in-memory catalog cache
runner	2 GB	2.0	100	One DuckDB per concurrent run, up to 10 concurrent
portal	512 MB	1.0	100	Standalone Next.js, mostly static
postgres	1 GB	1.0	100	Metadata only, low volume
minio	1 GB	1.0	100	Data file storage
nessie	512 MB	1.0	100	Catalog metadata, Quarkus runtime
minio-init	256 MB	0.5	100	One-shot, exits immediately

Total minimum: ~7.25 GB RAM. Recommended: 8 GB+ available for Docker.

Security Posture

Every container is hardened with defense-in-depth:

Read-only filesystem (read_only: true) on ratd, ratq, runner, portal --- with /tmp as tmpfs
Drop all Linux capabilities (cap_drop: [ALL])
No privilege escalation (no-new-privileges:true)
PID limits (100 per container) to prevent fork bombs
JSON body size limit (1 MB) on ratd to prevent request flooding
Rate limiting (50 req/s per IP, burst 100) on all API endpoints

See the Security page for the complete security model.

Where Data Lives

Data Type	Storage	Format
Pipeline source code (SQL/Python)	MinIO (S3)	Plain text, S3-versioned
Pipeline config	MinIO (S3)	YAML
Quality test SQL	MinIO (S3)	Plain text
Raw uploaded files	MinIO (S3)	CSV, Parquet, JSON
Transformed data (tables)	MinIO (S3)	Apache Iceberg (Parquet + metadata)
Table catalog	Nessie	Git-like refs pointing to Iceberg metadata
Platform state	Postgres	16 relational tables
Run logs	Postgres (JSONB column)	Structured log entries

See Storage Layout for the full S3 directory structure and Database Schema for the Postgres tables.

Next Steps

Detailed breakdown of each service’s responsibilities and internals

Services Deep Dive

Sequence diagrams showing how data moves through the system

Data Flow

The 5-phase execution model in detail

Pipeline Execution

Authentication, authorization, sandboxing, and hardening

Security Model

Architecture Decisions Services