Work in Progress: This page is under development. Use the feedback button on the bottom right to help us improve it.

Schedulers

Laminar uses schedulers to manage how worker processes are created and distributed when pipelines run. The scheduler determines where and how your pipeline's compute resources are allocated.

Running Example

Throughout this page, we'll use a concrete example to illustrate how each scheduler works:

Scenario: You have a pipeline that reads from Kafka, transforms data, and writes to Iceberg. The pipeline is configured with parallelism = 12, meaning it needs to run 12 concurrent tasks to process data efficiently.

[pipeline]
name = "kafka-to-iceberg"
parallelism = 12

Let's see how each scheduler handles this pipeline.

Task Slots

Before diving into schedulers, it's important to understand task slots.

A task slot is a unit of compute capacity that can execute one pipeline task. When a pipeline runs, it's divided into parallel tasks. Each task needs a slot to run.

Loading diagram...

For our example pipeline with parallelism = 12, the scheduler must provision 12 task slots before the pipeline can fully run.

Slot Configuration Calculator

Use this interactive calculator to determine the optimal slot configuration for your workload. Answer the questions below to get a recommended configuration.

Slot Configuration Calculator

Answer these questions to determine your optimal slot configuration

This determines the total number of task slots needed

This determines how many pods will be created: ceil(12 / 4) = 3

Calculated Configuration

Total Slots
12
Pods
3
CPU / Pod
3.6 cores
Memory / Pod
2.0Gi
Total CPU
10.8 cores
Total Memory
6.0Gi
[controller]
scheduler = "kubernetes"

[kubernetes-scheduler]
namespace = "laminar"
resource-mode = "per-slot"

[kubernetes-scheduler.worker]
task-slots = 4

[kubernetes-scheduler.worker.resources.requests]
cpu = "900m"
memory = "512Mi"

# Parallelism: 12
# Pods needed: 3
# Resources per pod: 3.6 cores CPU, 2.0Gi memory
# Total cluster resources: 10.8 cores CPU, 6.0Gi memory

Scheduler Types

SchedulerUse CaseDescription
EmbeddedTesting, single-nodeRuns workers as tasks within the controller process
ProcessLocal development, Desktop modeSpawns worker processes on the same machine
KubernetesProduction deploymentsCreates worker pods in a Kubernetes cluster

Embedded Scheduler

The Embedded Scheduler runs workers as async tasks within the controller process. This is primarily used for testing and simple single-node deployments.

How It Works

Loading diagram...
  • Workers run as Tokio async tasks in the same process
  • No separate processes or pods are created
  • All workers share the controller's memory space
  • Lightweight but limited scalability

Example Walkthrough

With our kafka-to-iceberg pipeline (parallelism = 12):

  1. Pipeline submitted → Controller receives the pipeline definition
  2. Task creation → Controller spawns 12 async tasks within its own process
  3. Execution → All 12 tasks share the controller's CPU and memory
  4. Resource usage → Single process, ~500MB-1GB RAM depending on data volume
[controller]
scheduler = "embedded"

Tradeoff: Simple to run, but all 12 tasks compete for resources in one process. If one task crashes, it can affect others.

When to Use

  • Unit testing
  • Integration testing
  • Single-pipeline debugging
  • Resource-constrained environments

Process Scheduler

The Process Scheduler spawns worker processes on the same machine as the controller. This is the default scheduler for local development and Desktop mode.

How It Works

Loading diagram...
  1. When a pipeline starts, the scheduler calculates how many worker processes are needed
  2. Each worker process is spawned using the Laminar binary with the worker subcommand
  3. Workers communicate with the controller via gRPC
  4. When a pipeline stops, worker processes are terminated

Example Walkthrough

With our kafka-to-iceberg pipeline (parallelism = 12) and slots-per-process = 4:

  1. Pipeline submitted → Controller receives the pipeline definition
  2. Worker calculation → ceil(12 / 4) = 3 worker processes needed
  3. Process spawning → Controller spawns 3 worker processes on the local machine
  4. Slot registration → Each worker registers 4 slots with the controller (12 total)
  5. Task assignment → Controller assigns tasks to available slots across workers
Pipeline: parallelism = 12
Slots per process: 4
Workers needed: ceil(12/4) = 3

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│ Worker 1        │  │ Worker 2        │  │ Worker 3        │
│ Slots: 1,2,3,4  │  │ Slots: 5,6,7,8  │  │ Slots: 9,10,11,12│
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Configuration

[controller]
scheduler = "process"
 
[process-scheduler]
slots-per-process = 4  # Each worker process handles 4 tasks

Tradeoff: Better isolation than embedded (each worker is a separate process), but limited to a single machine's resources.

When to Use

  • Local development
  • Desktop mode
  • Testing pipelines before deploying to Kubernetes
  • Single-machine deployments

Kubernetes Scheduler

The Kubernetes Scheduler creates worker pods in a Kubernetes cluster. This is the recommended scheduler for production deployments.

How It Works

Loading diagram...
  1. When a pipeline starts, the scheduler creates worker pods based on parallelism
  2. Pods are labeled with job and run identifiers for tracking
  3. Workers register with the controller and receive tasks
  4. When a pipeline stops, pods are deleted

Example Walkthrough

With our kafka-to-iceberg pipeline (parallelism = 12) and task-slots = 4:

  1. Pipeline submitted → Controller receives the pipeline definition
  2. Pod calculation → ceil(12 / 4) = 3 worker pods needed
  3. Pod creation → Controller creates 3 pods via Kubernetes API
  4. Scheduling → Kubernetes schedules pods across available nodes
  5. Registration → Workers start, connect to controller, register their slots
  6. Execution → Tasks distributed across 12 slots on 3 pods
Pipeline: parallelism = 12
Slots per pod: 4
Pods needed: ceil(12/4) = 3

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”  ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│ Pod 1 (Node A)  │  │ Pod 2 (Node B)  │  │ Pod 3 (Node A)  │
│ 4 slots         │  │ 4 slots         │  │ 4 slots         │
│ CPU: 3600m      │  │ CPU: 3600m      │  │ CPU: 3600m      │
│ Mem: 2Gi        │  │ Mem: 2Gi        │  │ Mem: 2Gi        │
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜  ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Configuration

[controller]
scheduler = "kubernetes"
 
[kubernetes-scheduler]
namespace = "laminar"
 
[kubernetes-scheduler.worker]
image = "ghcr.io/e6data/laminar:latest"
image-pull-policy = "IfNotPresent"
service-account-name = "laminar-worker"
task-slots = 4
command = "/app/laminar worker"
 
[kubernetes-scheduler.worker.resources.requests]
cpu = "900m"
memory = "500Mi"

Resource Modes: Per-Slot vs Per-Pod

The Kubernetes scheduler supports two resource allocation modes. This is crucial for our example pipeline.

Per-Slot Mode (Default)

Resources are multiplied by the number of task slots in the pod.

Loading diagram...

For our example with cpu = "900m", memory = "500Mi", and task-slots = 4:

ResourcePer SlotPer Pod (4 slots)Total (3 pods)
CPU900m3600m10800m
Memory500Mi2000Mi6000Mi
[kubernetes-scheduler]
resource-mode = "per-slot"
 
[kubernetes-scheduler.worker.resources.requests]
cpu = "900m"      # Per slot
memory = "500Mi"  # Per slot

Use per-slot when you want resources to scale with parallelism.

Per-Pod Mode

Resources are applied as-is to each pod, regardless of slot count.

Loading diagram...

For our example with cpu = "4000m", memory = "8Gi", and task-slots = 4:

ResourcePer PodTotal (3 pods)
CPU4000m12000m
Memory8Gi24Gi
[kubernetes-scheduler]
resource-mode = "per-pod"
 
[kubernetes-scheduler.worker.resources.requests]
cpu = "4000m"   # Fixed per pod
memory = "8Gi"  # Fixed per pod

Use per-pod when you want fixed-size pods regardless of slot count.

Pod Customization

You can customize worker pods with labels, annotations, volumes, node selectors, and tolerations:

[kubernetes-scheduler.worker]
labels = { team = "data-platform" }
annotations = { "prometheus.io/scrape" = "true" }
node-selector = { "node-type" = "compute" }
 
[[kubernetes-scheduler.worker.tolerations]]
key = "dedicated"
operator = "Equal"
value = "laminar"
effect = "NoSchedule"
 
[[kubernetes-scheduler.worker.volumes]]
name = "checkpoints"
hostPath = { path = "/data/checkpoints", type = "DirectoryOrCreate" }
 
[[kubernetes-scheduler.worker.volume-mounts]]
name = "checkpoints"
mountPath = "/checkpoints"

When to Use

  • Production deployments
  • Multi-node clusters
  • When you need autoscaling
  • When you need resource isolation

Comparison: Same Pipeline, Three Schedulers

Here's how our kafka-to-iceberg pipeline (parallelism = 12) runs on each scheduler:

AspectEmbeddedProcessKubernetes
Workers0 (in-process)3 processes3 pods
Slots12 async tasks12 (4 per process)12 (4 per pod)
IsolationNoneProcess-levelPod-level
Node distributionSingle processSingle machineMulti-node
Failure blast radiusAll tasks4 tasks4 tasks
Resource controlLimitedOS-levelKubernetes limits
ScalingVertical onlyVertical onlyHorizontal

Choosing a Scheduler

EnvironmentRecommended SchedulerWhy
Unit testsembeddedFast startup, no process overhead
Local developmentprocessGood isolation, easy debugging
Desktop modeprocessDefault for single-machine
CI/CD testingembedded or processDepends on test complexity
Kubernetes (dev)kubernetesMatch production behavior
Kubernetes (prod)kubernetesFull isolation, horizontal scaling

Monitoring

All schedulers expose metrics for monitoring:

  • laminar_controller_free_slots - Available task slots
  • laminar_controller_registered_slots - Total registered slots
  • laminar_controller_registered_nodes - Registered worker nodes