Schedulers

Laminar uses schedulers to manage how worker processes are created and distributed when pipelines run. The scheduler determines where and how your pipeline's compute resources are allocated.

Running Example

Throughout this page, we'll use a concrete example to illustrate how each scheduler works:

Scenario: You have a pipeline that reads from Kafka, transforms data, and writes to Iceberg. The pipeline is configured with parallelism = 12, meaning it needs to run 12 concurrent tasks to process data efficiently.

[pipeline]
name = "kafka-to-iceberg"
parallelism = 12

Let's see how each scheduler handles this pipeline.

Task Slots

Before diving into schedulers, it's important to understand task slots.

A task slot is a unit of compute capacity that can execute one pipeline task. When a pipeline runs, it's divided into parallel tasks. Each task needs a slot to run.

Loading diagram...

For our example pipeline with parallelism = 12, the scheduler must provision 12 task slots before the pipeline can fully run.

Slot Configuration Calculator

Use this interactive calculator to determine the optimal slot configuration for your workload. Answer the questions below to get a recommended configuration.

Slot Configuration Calculator

Answer these questions to determine your optimal slot configuration

1. What is your pipeline's parallelism?(How many concurrent tasks do you need?)

This determines the total number of task slots needed

2. Which scheduler will you use?

3. How many slots per pod?(Based on your node/machine capacity)

This determines how many pods will be created: ceil(12 / 4) = 3

4. Resource allocation mode

5. Resource requirements per task

CPU (millicores)

Memory (Mi)

Calculated Configuration

Total Slots

Pods

CPU / Pod

3.6 cores

Memory / Pod

2.0Gi

Total CPU

10.8 cores

Total Memory

6.0Gi

Generated Configuration

[controller]
scheduler = "kubernetes"

[kubernetes-scheduler]
namespace = "laminar"
resource-mode = "per-slot"

[kubernetes-scheduler.worker]
task-slots = 4

[kubernetes-scheduler.worker.resources.requests]
cpu = "900m"
memory = "512Mi"

# Parallelism: 12
# Pods needed: 3
# Resources per pod: 3.6 cores CPU, 2.0Gi memory
# Total cluster resources: 10.8 cores CPU, 6.0Gi memory

Scheduler Types

Scheduler	Use Case	Description
Embedded	Testing, single-node	Runs workers as tasks within the controller process
Process	Local development, Desktop mode	Spawns worker processes on the same machine
Kubernetes	Production deployments	Creates worker pods in a Kubernetes cluster

Embedded Scheduler

The Embedded Scheduler runs workers as async tasks within the controller process. This is primarily used for testing and simple single-node deployments.

How It Works

Loading diagram...

Workers run as Tokio async tasks in the same process
No separate processes or pods are created
All workers share the controller's memory space
Lightweight but limited scalability

Example Walkthrough

With our kafka-to-iceberg pipeline (parallelism = 12):

Pipeline submitted → Controller receives the pipeline definition
Task creation → Controller spawns 12 async tasks within its own process
Execution → All 12 tasks share the controller's CPU and memory
Resource usage → Single process, ~500MB-1GB RAM depending on data volume

[controller]
scheduler = "embedded"

Tradeoff: Simple to run, but all 12 tasks compete for resources in one process. If one task crashes, it can affect others.

When to Use

Unit testing
Integration testing
Single-pipeline debugging
Resource-constrained environments

Process Scheduler

The Process Scheduler spawns worker processes on the same machine as the controller. This is the default scheduler for local development and Desktop mode.

How It Works

Loading diagram...

When a pipeline starts, the scheduler calculates how many worker processes are needed
Each worker process is spawned using the Laminar binary with the worker subcommand
Workers communicate with the controller via gRPC
When a pipeline stops, worker processes are terminated

Example Walkthrough

With our kafka-to-iceberg pipeline (parallelism = 12) and slots-per-process = 4:

Pipeline submitted → Controller receives the pipeline definition
Worker calculation → ceil(12 / 4) = 3 worker processes needed
Process spawning → Controller spawns 3 worker processes on the local machine
Slot registration → Each worker registers 4 slots with the controller (12 total)
Task assignment → Controller assigns tasks to available slots across workers

Pipeline: parallelism = 12
Slots per process: 4
Workers needed: ceil(12/4) = 3

┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ Worker 1        │  │ Worker 2        │  │ Worker 3        │
│ Slots: 1,2,3,4  │  │ Slots: 5,6,7,8  │  │ Slots: 9,10,11,12│
└─────────────────┘  └─────────────────┘  └─────────────────┘

Configuration

[controller]
scheduler = "process"
 
[process-scheduler]
slots-per-process = 4  # Each worker process handles 4 tasks

Tradeoff: Better isolation than embedded (each worker is a separate process), but limited to a single machine's resources.

When to Use

Local development
Desktop mode
Testing pipelines before deploying to Kubernetes
Single-machine deployments

Kubernetes Scheduler

The Kubernetes Scheduler creates worker pods in a Kubernetes cluster. This is the recommended scheduler for production deployments.

How It Works

Loading diagram...

When a pipeline starts, the scheduler creates worker pods based on parallelism
Pods are labeled with job and run identifiers for tracking
Workers register with the controller and receive tasks
When a pipeline stops, pods are deleted

Example Walkthrough

With our kafka-to-iceberg pipeline (parallelism = 12) and task-slots = 4:

Pipeline submitted → Controller receives the pipeline definition
Pod calculation → ceil(12 / 4) = 3 worker pods needed
Pod creation → Controller creates 3 pods via Kubernetes API
Scheduling → Kubernetes schedules pods across available nodes
Registration → Workers start, connect to controller, register their slots
Execution → Tasks distributed across 12 slots on 3 pods

Pipeline: parallelism = 12
Slots per pod: 4
Pods needed: ceil(12/4) = 3

┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│ Pod 1 (Node A)  │  │ Pod 2 (Node B)  │  │ Pod 3 (Node A)  │
│ 4 slots         │  │ 4 slots         │  │ 4 slots         │
│ CPU: 3600m      │  │ CPU: 3600m      │  │ CPU: 3600m      │
│ Mem: 2Gi        │  │ Mem: 2Gi        │  │ Mem: 2Gi        │
└─────────────────┘  └─────────────────┘  └─────────────────┘

Configuration

[controller]
scheduler = "kubernetes"
 
[kubernetes-scheduler]
namespace = "laminar"
 
[kubernetes-scheduler.worker]
image = "ghcr.io/e6data/laminar:latest"
image-pull-policy = "IfNotPresent"
service-account-name = "laminar-worker"
task-slots = 4
command = "/app/laminar worker"
 
[kubernetes-scheduler.worker.resources.requests]
cpu = "900m"
memory = "500Mi"

Resource Modes: Per-Slot vs Per-Pod

The Kubernetes scheduler supports two resource allocation modes. This is crucial for our example pipeline.

Per-Slot Mode (Default)

Resources are multiplied by the number of task slots in the pod.

Loading diagram...

For our example with cpu = "900m", memory = "500Mi", and task-slots = 4:

Resource	Per Slot	Per Pod (4 slots)	Total (3 pods)
CPU	900m	3600m	10800m
Memory	500Mi	2000Mi	6000Mi

[kubernetes-scheduler]
resource-mode = "per-slot"
 
[kubernetes-scheduler.worker.resources.requests]
cpu = "900m"      # Per slot
memory = "500Mi"  # Per slot

Use per-slot when you want resources to scale with parallelism.

Per-Pod Mode

Resources are applied as-is to each pod, regardless of slot count.

Loading diagram...

For our example with cpu = "4000m", memory = "8Gi", and task-slots = 4:

Resource	Per Pod	Total (3 pods)
CPU	4000m	12000m
Memory	8Gi	24Gi

[kubernetes-scheduler]
resource-mode = "per-pod"
 
[kubernetes-scheduler.worker.resources.requests]
cpu = "4000m"   # Fixed per pod
memory = "8Gi"  # Fixed per pod

Use per-pod when you want fixed-size pods regardless of slot count.

Pod Customization

You can customize worker pods with labels, annotations, volumes, node selectors, and tolerations:

[kubernetes-scheduler.worker]
labels = { team = "data-platform" }
annotations = { "prometheus.io/scrape" = "true" }
node-selector = { "node-type" = "compute" }
 
[[kubernetes-scheduler.worker.tolerations]]
key = "dedicated"
operator = "Equal"
value = "laminar"
effect = "NoSchedule"
 
[[kubernetes-scheduler.worker.volumes]]
name = "checkpoints"
hostPath = { path = "/data/checkpoints", type = "DirectoryOrCreate" }
 
[[kubernetes-scheduler.worker.volume-mounts]]
name = "checkpoints"
mountPath = "/checkpoints"

When to Use

Production deployments
Multi-node clusters
When you need autoscaling
When you need resource isolation

Comparison: Same Pipeline, Three Schedulers

Here's how our kafka-to-iceberg pipeline (parallelism = 12) runs on each scheduler:

Aspect	Embedded	Process	Kubernetes
Workers	0 (in-process)	3 processes	3 pods
Slots	12 async tasks	12 (4 per process)	12 (4 per pod)
Isolation	None	Process-level	Pod-level
Node distribution	Single process	Single machine	Multi-node
Failure blast radius	All tasks	4 tasks	4 tasks
Resource control	Limited	OS-level	Kubernetes limits
Scaling	Vertical only	Vertical only	Horizontal

Choosing a Scheduler

Environment	Recommended Scheduler	Why
Unit tests	`embedded`	Fast startup, no process overhead
Local development	`process`	Good isolation, easy debugging
Desktop mode	`process`	Default for single-machine
CI/CD testing	`embedded` or `process`	Depends on test complexity
Kubernetes (dev)	`kubernetes`	Match production behavior
Kubernetes (prod)	`kubernetes`	Full isolation, horizontal scaling

Monitoring

All schedulers expose metrics for monitoring:

laminar_controller_free_slots - Available task slots
laminar_controller_registered_slots - Total registered slots
laminar_controller_registered_nodes - Registered worker nodes