Pipeline Execution
This page explains how Laminar executes SQL pipelines on streaming data, from query parsing to record output.
Execution Overview
Query Compilation
When you create a pipeline, Laminar compiles your SQL query through several stages.
1. Parsing
The SQL parser converts your query text into an Abstract Syntax Tree (AST).
INSERT INTO output_topic
SELECT user_id, event_type, event_time
FROM events
WHERE event_type = 'click'2. Logical Planning
The AST is transformed into a logical plan representing the query operations using DataFusion.
3. Optimization
The optimizer applies rules to improve query performance:
| Optimization | Description |
|---|---|
| Predicate Pushdown | Move filters closer to sources to reduce data volume |
| Projection Pruning | Remove unused columns early in the pipeline |
| Operator Fusion | Combine adjacent operators to reduce overhead |
| Partition Pruning | Skip irrelevant partitions based on filters |
4. Physical Planning
The logical plan is converted to a physical execution plan with concrete operators.
Operator Types
Laminar supports various operator types for stream processing.
Source Operators
| Operator | Description |
|---|---|
| Kafka Source | Consumes from Kafka topics with offset tracking |
| Confluent Source | Consumes from Confluent Cloud |
| Kinesis Source | Reads from Kinesis streams with shard management |
| Filesystem Source | Reads from S3, GCS, or local files |
| Mock Source | Generates synthetic data for testing |
Transform Operators
| Operator | Description |
|---|---|
| Filter | Evaluates predicates, passes matching records |
| Project | Selects and transforms columns |
| Aggregate | Computes aggregations (COUNT, SUM, AVG, etc.) |
| Join | Combines records from multiple streams |
Sink Operators
| Operator | Description |
|---|---|
| Kafka Sink | Writes to Kafka topics |
| Confluent Sink | Writes to Confluent Cloud |
| Iceberg Sink | Writes to Iceberg tables |
| Delta Lake Sink | Writes to Delta Lake tables |
| Filesystem Sink | Writes to S3, GCS, or local files |
| StdOut Sink | Prints to console for debugging |
Parallelism Model
Laminar processes data in parallel using operator-level parallelism. Each operator in the pipeline can have multiple subtasks running concurrently.
Parallelism Concepts
| Concept | Description |
|---|---|
| Subtask | Instance of an operator processing a portion of the data |
| Parallelism | Number of subtasks per operator |
| Queue | Bounded channel connecting operators with backpressure |
Data Distribution
- Source subtasks read from assigned partitions
- Data is distributed to downstream subtasks
- Each subtask processes its portion independently
- Bounded queues between operators provide flow control
Record Flow
This diagram shows how a single record flows through the pipeline:
Processing Guarantees
| Guarantee | Description |
|---|---|
| Ordering | Records within a partition maintain order |
| Exactly-once | Each record is processed exactly once (with checkpointing) |
| At-least-once | Without checkpointing, records may be reprocessed |
Backpressure
Laminar handles backpressure when downstream operators can't keep up.
Backpressure Mechanisms
- Bounded Queues: Row-count aware buffers between operators
- Flow Control: Senders block when queues are full, automatically slowing upstream operators