Core Concepts

Understanding the core concepts of Laminar is essential for building effective streaming pipelines. This guide introduces the fundamental building blocks: Profiles, Tables, Pipelines, and Jobs.

Pipeline Overview

A Laminar pipeline connects data sources to sinks through SQL transformations. Here's the high-level flow:

Profiles store connection credentials (e.g., Kafka brokers, Iceberg catalog)
Tables define schemas and link to profiles
Pipelines run SQL that reads from source tables and writes to sink tables

Profiles

Profiles store reusable connection credentials and configuration. Instead of repeating connection details for every table, create a profile once and reference it across multiple tables.

Profile Config Example

{
  "name": "my_kafka_profile",
  "type": "kafka",
  "config": {
    "bootstrapServers": "broker1:9092,broker2:9092",
    "authentication": {
      "protocol": "SASL_SSL",
      "mechanism": "SCRAM-SHA-256",
      "username": "user",
      "password": "password"
    }
  }
}

Supported Profile Types

Kafka - Apache Kafka, Amazon MSK, Redpanda
Confluent - Confluent Cloud
Iceberg - Apache Iceberg tables
CDC - Change Data Capture from databases

Tables

Tables represent external data sources and sinks. They define how Laminar connects to external systems and the schema of the data. A table consists of two parts: config and schema.

Table Config

The config section specifies connector-specific settings like topic name, offset handling, and commit mode.

{
  "name": "user_events",
  "profile": "my_kafka_profile",
  "config": {
    "topic": "user-events",
    "type": {
      "source": {
        "offset": "latest"
      }
    }
  }
}

Supported Connectors:

Kafka
Confluent
Kinesis
Iceberg
Delta Lake
Filesystem (S3, GCS, local)
CDC (Change Data Capture)
Stdout (for debugging)

Table Schema

The schema section defines the data format and field definitions. See the Schema documentation for full details.

{
  "schema": {
    "format": { "json": {} },
    "fields": [
      {
        "field_name": "event_id",
        "field_type": { "type": { "primitive": "Utf8" } },
        "nullable": false
      },
      {
        "field_name": "user_id",
        "field_type": { "type": { "primitive": "Int64" } },
        "nullable": false
      },
      {
        "field_name": "event_time",
        "field_type": { "type": { "primitive": "DateTime" } },
        "nullable": false
      }
    ]
  }
}

Pipelines

Pipelines are the heart of Laminar. A pipeline is SQL-based stream processing logic that reads from source tables, transforms data, and writes to sink tables.

What is a Pipeline?

Think of a pipeline as a continuously running query that processes data as it arrives. Unlike batch queries that run once and finish, streaming pipelines run indefinitely, processing events in real time.

Pipeline SQL

Pipelines are defined using standard SQL. They reference tables you've already created:

INSERT INTO events_iceberg
SELECT
  event_id,
  user_id,
  event_type,
  event_time,
  properties
FROM user_events
WHERE event_type != 'heartbeat'

Jobs

Jobs are the runtime execution units of pipelines. When you start a pipeline, Laminar creates a job that manages the actual data processing.

What is a Job?

A job represents a running instance of a pipeline. It manages:

Task parallelism
Resource allocation
State management
Checkpointing
Failure recovery

Job Metrics

Key metrics to monitor:

Records In: Events received from sources
Records Out: Events written to sinks
Throughput: Events per second
Latency: Processing delay
Backpressure: Slow downstream causing buildup
Checkpoint Duration: Time to save state

Putting It All Together

Here's how all the concepts work together in a typical workflow:

1. Create a Kafka Profile

Configure connection to your Kafka cluster.

2. Create an Iceberg Profile

Configure connection to your lakehouse.

3. Create Source Table

Define raw_events table pointing to a Kafka topic, using the Kafka profile.

4. Create Sink Table

Define events_iceberg table pointing to an Iceberg table, using the Iceberg profile.

5. Create Pipeline

Write SQL to transform and route data:

INSERT INTO events_iceberg
SELECT
  event_id,
  user_id,
  event_type,
  event_time,
  CASE
    WHEN event_type = 'purchase' THEN 'transaction'
    ELSE 'activity'
  END as category
FROM raw_events
WHERE user_id IS NOT NULL

6. Start Pipeline

Start the pipeline via UI or lmnr CLI. A job is created and begins processing.