ReferenceAPI ReferenceQuery

Query

The query endpoints provide interactive SQL querying against your data lake. Queries are read-only and execute via ratq, a Python DuckDB sidecar that connects to your Iceberg tables. All query requests are proxied through ratd, which deserializes Arrow IPC responses into JSON for REST clients.


Endpoints

MethodEndpointDescription
POST/api/v1/queryExecute an interactive SQL query
GET/api/v1/schemaGet all tables with column schemas (bulk)

Execute Query

POST /api/v1/query

Executes a read-only SQL query against the data lake using DuckDB. Queries have a 30-second timeout and a maximum request size of 100 KB.

Request Body

FieldTypeRequiredDescription
sqlstringYesSQL query to execute (max 100 KB)
namespacestringNoNamespace context for table resolution
limitintegerNoMaximum rows to return (default: 1000)

Request

curl -X POST http://localhost:8080/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "SELECT * FROM silver.orders WHERE amount > 100 LIMIT 10",
    "namespace": "default",
    "limit": 1000
  }'

Response — 200 OK

{
  "columns": [
    { "name": "id", "type": "VARCHAR" },
    { "name": "amount", "type": "DECIMAL(14,2)" },
    { "name": "status", "type": "VARCHAR" },
    { "name": "created_at", "type": "TIMESTAMP" }
  ],
  "rows": [
    ["ORD-001", 129.99, "completed", "2026-02-12T10:00:00Z"],
    ["ORD-002", 245.50, "completed", "2026-02-12T11:30:00Z"]
  ],
  "total_rows": 10,
  "duration_ms": 45
}

Response Fields

FieldTypeDescription
columnsarrayColumn definitions
columns[].namestringColumn name
columns[].typestringDuckDB type (e.g., VARCHAR, INTEGER, DECIMAL(14,2), TIMESTAMP)
rowsarrayResult rows as arrays of values
total_rowsintegerNumber of rows returned
duration_msintegerQuery execution time in milliseconds

Error Responses

StatusCodeDescription
400INVALID_ARGUMENTMissing sql field, or query exceeds 100 KB size limit
400INVALID_ARGUMENTSQL syntax error or invalid table reference
500INTERNALratq sidecar not configured (RATQ_ADDR not set)
503UNAVAILABLEratq sidecar is unreachable
⚠️

Queries are read-only. Any attempt to execute DDL (CREATE, DROP, ALTER) or DML (INSERT, UPDATE, DELETE) statements will be rejected by the query service. Write operations are only performed by the runner during pipeline execution.

The query service uses DuckDB’s Iceberg extension to scan Iceberg tables directly from S3. Table names use the format {layer}.{table_name} within a namespace context — for example, silver.orders resolves to the Iceberg table at {namespace}/silver/orders/.


Get Schema (Bulk)

GET /api/v1/schema

Returns all tables with their column schemas in a single call. This endpoint uses batch fetching to avoid N+1 gRPC calls to the query service.

Request

curl http://localhost:8080/api/v1/schema

Response — 200 OK

{
  "tables": [
    {
      "namespace": "default",
      "layer": "silver",
      "name": "orders",
      "columns": [
        { "name": "id", "type": "VARCHAR" },
        { "name": "amount", "type": "DECIMAL(14,2)" },
        { "name": "status", "type": "VARCHAR" },
        { "name": "created_at", "type": "TIMESTAMP" }
      ]
    },
    {
      "namespace": "default",
      "layer": "bronze",
      "name": "raw_orders",
      "columns": [
        { "name": "id", "type": "VARCHAR" },
        { "name": "amount", "type": "VARCHAR" },
        { "name": "status", "type": "VARCHAR" },
        { "name": "source_file", "type": "VARCHAR" }
      ]
    }
  ]
}

Response Fields

FieldTypeDescription
tablesarrayList of table schema objects
tables[].namespacestringNamespace the table belongs to
tables[].layerstringData layer: bronze, silver, or gold
tables[].namestringTable name
tables[].columnsarrayColumn definitions for the table
tables[].columns[].namestringColumn name
tables[].columns[].typestringDuckDB column type

Error Responses

StatusCodeDescription
500INTERNALratq sidecar not configured
503UNAVAILABLEratq sidecar is unreachable

The schema endpoint is used by the portal’s editor for autocompletion and by the lineage graph for displaying column-level metadata. It returns all tables across all namespaces in a single response.