Fault Tolerance

Laminar provides exactly-once processing semantics through checkpointing and automatic recovery mechanisms.

Overview

Checkpoints are consistent snapshots of pipeline state that enable recovery.

Component	Description	Storage
Source Offsets	Position in each source partition	Object Storage
Operator State	Aggregation buffers, window state, join state	Object Storage
Metadata	Checkpoint ID, timestamp, status	PostgreSQL

Configure checkpointing behavior in your pipeline settings.

Setting	Description	Default
`checkpointIntervalMs`	Time between checkpoints	60000 (1 min)
`checkpointTimeoutMs`	Max time for checkpoint completion	600000 (10 min)
`minPauseBetweenCheckpoints`	Minimum gap between checkpoints	0
`maxConcurrentCheckpoints`	Number of in-flight checkpoints	1

{
  "checkpointing": {
    "checkpointIntervalMs": 60000,
    "checkpointTimeoutMs": 600000
  }
}

Laminar achieves exactly-once processing through coordinated checkpointing.

When a failure occurs, Laminar automatically recovers from the last successful checkpoint.

Laminar uses a Parquet-based state backend for checkpoint storage.

Property	Description
Format	Apache Parquet with Zstd compression
Storage	S3, GCS, Azure Blob, or local filesystem
State Tables	GlobalKeyedTable, ExpiringTimeKeyTable

Track checkpoint and recovery metrics.

Metric	Description
`checkpoint_duration_ms`	Time to complete checkpoint
`checkpoint_size_bytes`	Size of checkpoint data
`checkpoint_failures`	Number of failed checkpoints
`recovery_time_ms`	Time to recover from failure
`records_reprocessed`	Records replayed after recovery