Schedulers
Laminar uses schedulers to manage how worker processes are created and distributed when pipelines run. The scheduler determines where and how your pipeline's compute resources are allocated.
Running Example
Throughout this page, we'll use a concrete example to illustrate how each scheduler works:
Scenario: You have a pipeline that reads from Kafka, transforms data, and writes to Iceberg. The pipeline is configured with parallelism = 12, meaning it needs to run 12 concurrent tasks to process data efficiently.
[pipeline]
name = "kafka-to-iceberg"
parallelism = 12Let's see how each scheduler handles this pipeline.
Task Slots
Before diving into schedulers, it's important to understand task slots.
A task slot is a unit of compute capacity that can execute one pipeline task. When a pipeline runs, it's divided into parallel tasks. Each task needs a slot to run.
For our example pipeline with parallelism = 12, the scheduler must provision 12 task slots before the pipeline can fully run.
Slot Configuration Calculator
Use this interactive calculator to determine the optimal slot configuration for your workload. Answer the questions below to get a recommended configuration.
Slot Configuration Calculator
Answer these questions to determine your optimal slot configuration
This determines the total number of task slots needed
This determines how many pods will be created: ceil(12 / 4) = 3
Calculated Configuration
[controller]
scheduler = "kubernetes"
[kubernetes-scheduler]
namespace = "laminar"
resource-mode = "per-slot"
[kubernetes-scheduler.worker]
task-slots = 4
[kubernetes-scheduler.worker.resources.requests]
cpu = "900m"
memory = "512Mi"
# Parallelism: 12
# Pods needed: 3
# Resources per pod: 3.6 cores CPU, 2.0Gi memory
# Total cluster resources: 10.8 cores CPU, 6.0Gi memoryScheduler Types
| Scheduler | Use Case | Description |
|---|---|---|
| Embedded | Testing, single-node | Runs workers as tasks within the controller process |
| Process | Local development, Desktop mode | Spawns worker processes on the same machine |
| Kubernetes | Production deployments | Creates worker pods in a Kubernetes cluster |
Embedded Scheduler
The Embedded Scheduler runs workers as async tasks within the controller process. This is primarily used for testing and simple single-node deployments.
How It Works
- Workers run as Tokio async tasks in the same process
- No separate processes or pods are created
- All workers share the controller's memory space
- Lightweight but limited scalability
Example Walkthrough
With our kafka-to-iceberg pipeline (parallelism = 12):
- Pipeline submitted ā Controller receives the pipeline definition
- Task creation ā Controller spawns 12 async tasks within its own process
- Execution ā All 12 tasks share the controller's CPU and memory
- Resource usage ā Single process, ~500MB-1GB RAM depending on data volume
[controller]
scheduler = "embedded"Tradeoff: Simple to run, but all 12 tasks compete for resources in one process. If one task crashes, it can affect others.
When to Use
- Unit testing
- Integration testing
- Single-pipeline debugging
- Resource-constrained environments
Process Scheduler
The Process Scheduler spawns worker processes on the same machine as the controller. This is the default scheduler for local development and Desktop mode.
How It Works
- When a pipeline starts, the scheduler calculates how many worker processes are needed
- Each worker process is spawned using the Laminar binary with the
workersubcommand - Workers communicate with the controller via gRPC
- When a pipeline stops, worker processes are terminated
Example Walkthrough
With our kafka-to-iceberg pipeline (parallelism = 12) and slots-per-process = 4:
- Pipeline submitted ā Controller receives the pipeline definition
- Worker calculation ā
ceil(12 / 4) = 3worker processes needed - Process spawning ā Controller spawns 3 worker processes on the local machine
- Slot registration ā Each worker registers 4 slots with the controller (12 total)
- Task assignment ā Controller assigns tasks to available slots across workers
Pipeline: parallelism = 12
Slots per process: 4
Workers needed: ceil(12/4) = 3
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā
ā Worker 1 ā ā Worker 2 ā ā Worker 3 ā
ā Slots: 1,2,3,4 ā ā Slots: 5,6,7,8 ā ā Slots: 9,10,11,12ā
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā
Configuration
[controller]
scheduler = "process"
[process-scheduler]
slots-per-process = 4 # Each worker process handles 4 tasksTradeoff: Better isolation than embedded (each worker is a separate process), but limited to a single machine's resources.
When to Use
- Local development
- Desktop mode
- Testing pipelines before deploying to Kubernetes
- Single-machine deployments
Kubernetes Scheduler
The Kubernetes Scheduler creates worker pods in a Kubernetes cluster. This is the recommended scheduler for production deployments.
How It Works
- When a pipeline starts, the scheduler creates worker pods based on parallelism
- Pods are labeled with job and run identifiers for tracking
- Workers register with the controller and receive tasks
- When a pipeline stops, pods are deleted
Example Walkthrough
With our kafka-to-iceberg pipeline (parallelism = 12) and task-slots = 4:
- Pipeline submitted ā Controller receives the pipeline definition
- Pod calculation ā
ceil(12 / 4) = 3worker pods needed - Pod creation ā Controller creates 3 pods via Kubernetes API
- Scheduling ā Kubernetes schedules pods across available nodes
- Registration ā Workers start, connect to controller, register their slots
- Execution ā Tasks distributed across 12 slots on 3 pods
Pipeline: parallelism = 12
Slots per pod: 4
Pods needed: ceil(12/4) = 3
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā
ā Pod 1 (Node A) ā ā Pod 2 (Node B) ā ā Pod 3 (Node A) ā
ā 4 slots ā ā 4 slots ā ā 4 slots ā
ā CPU: 3600m ā ā CPU: 3600m ā ā CPU: 3600m ā
ā Mem: 2Gi ā ā Mem: 2Gi ā ā Mem: 2Gi ā
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāāāāā
Configuration
[controller]
scheduler = "kubernetes"
[kubernetes-scheduler]
namespace = "laminar"
[kubernetes-scheduler.worker]
image = "ghcr.io/e6data/laminar:latest"
image-pull-policy = "IfNotPresent"
service-account-name = "laminar-worker"
task-slots = 4
command = "/app/laminar worker"
[kubernetes-scheduler.worker.resources.requests]
cpu = "900m"
memory = "500Mi"Resource Modes: Per-Slot vs Per-Pod
The Kubernetes scheduler supports two resource allocation modes. This is crucial for our example pipeline.
Per-Slot Mode (Default)
Resources are multiplied by the number of task slots in the pod.
For our example with cpu = "900m", memory = "500Mi", and task-slots = 4:
| Resource | Per Slot | Per Pod (4 slots) | Total (3 pods) |
|---|---|---|---|
| CPU | 900m | 3600m | 10800m |
| Memory | 500Mi | 2000Mi | 6000Mi |
[kubernetes-scheduler]
resource-mode = "per-slot"
[kubernetes-scheduler.worker.resources.requests]
cpu = "900m" # Per slot
memory = "500Mi" # Per slotUse per-slot when you want resources to scale with parallelism.
Per-Pod Mode
Resources are applied as-is to each pod, regardless of slot count.
For our example with cpu = "4000m", memory = "8Gi", and task-slots = 4:
| Resource | Per Pod | Total (3 pods) |
|---|---|---|
| CPU | 4000m | 12000m |
| Memory | 8Gi | 24Gi |
[kubernetes-scheduler]
resource-mode = "per-pod"
[kubernetes-scheduler.worker.resources.requests]
cpu = "4000m" # Fixed per pod
memory = "8Gi" # Fixed per podUse per-pod when you want fixed-size pods regardless of slot count.
Pod Customization
You can customize worker pods with labels, annotations, volumes, node selectors, and tolerations:
[kubernetes-scheduler.worker]
labels = { team = "data-platform" }
annotations = { "prometheus.io/scrape" = "true" }
node-selector = { "node-type" = "compute" }
[[kubernetes-scheduler.worker.tolerations]]
key = "dedicated"
operator = "Equal"
value = "laminar"
effect = "NoSchedule"
[[kubernetes-scheduler.worker.volumes]]
name = "checkpoints"
hostPath = { path = "/data/checkpoints", type = "DirectoryOrCreate" }
[[kubernetes-scheduler.worker.volume-mounts]]
name = "checkpoints"
mountPath = "/checkpoints"When to Use
- Production deployments
- Multi-node clusters
- When you need autoscaling
- When you need resource isolation
Comparison: Same Pipeline, Three Schedulers
Here's how our kafka-to-iceberg pipeline (parallelism = 12) runs on each scheduler:
| Aspect | Embedded | Process | Kubernetes |
|---|---|---|---|
| Workers | 0 (in-process) | 3 processes | 3 pods |
| Slots | 12 async tasks | 12 (4 per process) | 12 (4 per pod) |
| Isolation | None | Process-level | Pod-level |
| Node distribution | Single process | Single machine | Multi-node |
| Failure blast radius | All tasks | 4 tasks | 4 tasks |
| Resource control | Limited | OS-level | Kubernetes limits |
| Scaling | Vertical only | Vertical only | Horizontal |
Choosing a Scheduler
| Environment | Recommended Scheduler | Why |
|---|---|---|
| Unit tests | embedded | Fast startup, no process overhead |
| Local development | process | Good isolation, easy debugging |
| Desktop mode | process | Default for single-machine |
| CI/CD testing | embedded or process | Depends on test complexity |
| Kubernetes (dev) | kubernetes | Match production behavior |
| Kubernetes (prod) | kubernetes | Full isolation, horizontal scaling |
Monitoring
All schedulers expose metrics for monitoring:
laminar_controller_free_slots- Available task slotslaminar_controller_registered_slots- Total registered slotslaminar_controller_registered_nodes- Registered worker nodes