Pipeline Execution

This page explains how Laminar executes SQL pipelines on streaming data, from query parsing to record output.

Execution Overview

When you create a pipeline, Laminar compiles your SQL query through several stages.

The SQL parser converts your query text into an Abstract Syntax Tree (AST).

INSERT INTO output_topic
SELECT user_id, event_type, event_time
FROM events
WHERE event_type = 'click'

The AST is transformed into a logical plan representing the query operations using DataFusion.

The optimizer applies rules to improve query performance:

Optimization	Description
Predicate Pushdown	Move filters closer to sources to reduce data volume
Projection Pruning	Remove unused columns early in the pipeline
Operator Fusion	Combine adjacent operators to reduce overhead
Partition Pruning	Skip irrelevant partitions based on filters

The logical plan is converted to a physical execution plan with concrete operators.

Laminar supports various operator types for stream processing.

Operator	Description
Kafka Source	Consumes from Kafka topics with offset tracking
Confluent Source	Consumes from Confluent Cloud
Kinesis Source	Reads from Kinesis streams with shard management
Filesystem Source	Reads from S3, GCS, or local files
Mock Source	Generates synthetic data for testing

Operator	Description
Filter	Evaluates predicates, passes matching records
Project	Selects and transforms columns
Aggregate	Computes aggregations (COUNT, SUM, AVG, etc.)
Join	Combines records from multiple streams

Operator	Description
Kafka Sink	Writes to Kafka topics
Confluent Sink	Writes to Confluent Cloud
Iceberg Sink	Writes to Iceberg tables
Delta Lake Sink	Writes to Delta Lake tables
Filesystem Sink	Writes to S3, GCS, or local files
StdOut Sink	Prints to console for debugging

Laminar processes data in parallel using operator-level parallelism. Each operator in the pipeline can have multiple subtasks running concurrently.

Concept	Description
Subtask	Instance of an operator processing a portion of the data
Parallelism	Number of subtasks per operator
Queue	Bounded channel connecting operators with backpressure

This diagram shows how a single record flows through the pipeline:

Guarantee	Description
Ordering	Records within a partition maintain order
Exactly-once	Each record is processed exactly once (with checkpointing)
At-least-once	Without checkpointing, records may be reprocessed

Laminar handles backpressure when downstream operators can't keep up.

Bounded Queues: Row-count aware buffers between operators
Flow Control: Senders block when queues are full, automatically slowing upstream operators