Work in Progress: This page is under development. Use the feedback button on the bottom right to help us improve it.

Pipeline Execution

This page explains how Laminar executes SQL pipelines on streaming data, from query parsing to record output.

Execution Overview


Query Compilation

When you create a pipeline, Laminar compiles your SQL query through several stages.

1. Parsing

The SQL parser converts your query text into an Abstract Syntax Tree (AST).

INSERT INTO output_topic
SELECT user_id, event_type, event_time
FROM events
WHERE event_type = 'click'

2. Logical Planning

The AST is transformed into a logical plan representing the query operations using DataFusion.

3. Optimization

The optimizer applies rules to improve query performance:

OptimizationDescription
Predicate PushdownMove filters closer to sources to reduce data volume
Projection PruningRemove unused columns early in the pipeline
Operator FusionCombine adjacent operators to reduce overhead
Partition PruningSkip irrelevant partitions based on filters

4. Physical Planning

The logical plan is converted to a physical execution plan with concrete operators.


Operator Types

Laminar supports various operator types for stream processing.

Source Operators

OperatorDescription
Kafka SourceConsumes from Kafka topics with offset tracking
Confluent SourceConsumes from Confluent Cloud
Kinesis SourceReads from Kinesis streams with shard management
Filesystem SourceReads from S3, GCS, or local files
Mock SourceGenerates synthetic data for testing

Transform Operators

OperatorDescription
FilterEvaluates predicates, passes matching records
ProjectSelects and transforms columns
AggregateComputes aggregations (COUNT, SUM, AVG, etc.)
JoinCombines records from multiple streams

Sink Operators

OperatorDescription
Kafka SinkWrites to Kafka topics
Confluent SinkWrites to Confluent Cloud
Iceberg SinkWrites to Iceberg tables
Delta Lake SinkWrites to Delta Lake tables
Filesystem SinkWrites to S3, GCS, or local files
StdOut SinkPrints to console for debugging

Parallelism Model

Laminar processes data in parallel using operator-level parallelism. Each operator in the pipeline can have multiple subtasks running concurrently.

Parallelism Concepts

ConceptDescription
SubtaskInstance of an operator processing a portion of the data
ParallelismNumber of subtasks per operator
QueueBounded channel connecting operators with backpressure

Data Distribution

  1. Source subtasks read from assigned partitions
  2. Data is distributed to downstream subtasks
  3. Each subtask processes its portion independently
  4. Bounded queues between operators provide flow control

Record Flow

This diagram shows how a single record flows through the pipeline:

Processing Guarantees

GuaranteeDescription
OrderingRecords within a partition maintain order
Exactly-onceEach record is processed exactly once (with checkpointing)
At-least-onceWithout checkpointing, records may be reprocessed

Backpressure

Laminar handles backpressure when downstream operators can't keep up.

Backpressure Mechanisms

  1. Bounded Queues: Row-count aware buffers between operators
  2. Flow Control: Senders block when queues are full, automatically slowing upstream operators