Configuration
Overview
Laminar has a flexible and powerful configuration system that allows options to be set via files (in TOML or Yaml format) and environment variables.
The system will look for configuration in the following places, from highest to lowest priority:
LAMINAR__*environment variables- Config file specified via the
--configoption - Any *.toml or *.yaml files in the
--config-dirdirectory laminar.tomlin the current directory$(user conf dir)/laminar/config(toml or yaml) — (this is~/.config/laminaron Linux and~/Library/Application Support/laminaron MacOS)- Default configuration
Config files
In TOML or YAML, nested configurations are specified as tables under the given key name, for example:
checkpoint-url = 's3://my-bucket/checkpoints'
[controller]
scheduler = 'node'
[database]
type = "postgres"Environment variables
All configuration options can be set as environment variables as well. To convert a config name into an environment variable, the following rule are applied:
- Start with
LAMINAR__ - Replace all dots (i.e., layers of nesting) with
__(double underscore) - Replace all
-with_(single underscore) - Uppercase all letters
Some examples:
checkpoint-url=>LAMINAR__CHECKPOINT_URLpipeline.compaction.enabled=>LAMINAR__PIPELINE__COMPACTION__ENABLEDapi.bind-address=>LAMINAR__API__BIND_ADDRESS
Reasonable type conversions will be applied for values specified as environment variable, for example numbers and booleans will be parsed into the correct type.
Options
Here we list all of the available configuration options by the key they are nested under. So for example,
the option in the Pipeline section listed as source-batch-size would be specified in the config file as
pipeline.source-batch-size or as a table
[pipeline]
source-batch-size = 128
Top-level options:
| Name | Description | Default Value |
|---|---|---|
checkpoint-url | URL of an object store or filesystem for storing checkpoints; in a distributed cluster this must be a location available to all nodes | /tmp/laminar/checkpoints |
default-checkpoint-interval | Default checkpointing interval | 10s |
api-endpoint | Endpoint of the API, used by other services to connect to it | inferred |
controller-endpoint | Endpoint of the controller, used by other services to connect to it | inferred |
disable-telemetry | Disable open-source telemetry | false |
Pipeline
Configuration that applies to individual pipelines.
Key: pipeline
| Name | Description | Default Value |
|---|---|---|
source-batch-size | Max size of source batches | 512 |
source-batch-linger | Batch linger time (how long to wait before flushing) | 100ms |
update-aggregate-flush-interval | How often to flush aggregates | 1s |
allowed-restarts | How many restarts to allow before moving to failed (-1 for infinite) 20 | |
worker-heartbeat-timeout | Number of seconds to wait for a worker heartbeat before considering it dead | 30s |
healthy-duration | After this amount of time, we consider the job to be healthy and reset the restarts counter | 2m |
worker-startup-time | Amount of time to wait for workers to start up before considering them failed | 10m |
task-startup-time | Amount of time to wait for tasks to startup before considering it failed | 2m |
compaction.enabled | Whether to enable compaction for checkpoints | false |
compaction.checkpoints-to-compact | The number of outstanding checkpoints that will trigger compaction | 4 |
chaining.enabled | Whether to enable operator chaining, which reduces the number of operators in the pipeline | false |
Run (pipeline clusters)
Configuration for pipeline clusters
Key: run
| Name | Description | Default Value |
|---|---|---|
query | The query to run for this pipeline cluster (equivalent to the query command-line parameter | none |
state-dir | Sets the directory that state will be written to and read from | none |
API
Configuration for the API service
Key: api
| Name | Description | Default Value |
|---|---|---|
bind-address | The host the API service should bind to | 0.0.0.0 |
http-port | The HTTP port for the API service | 5115 |
run-http-port | The HTTP port for the API service in run mode; defaults to a random port | 0 |
Controller
Configuration for the controller service
Key: controller
| Name | Description | Default Value |
|---|---|---|
bind-address | The host the controller should bind to | 0.0.0.0 |
rpc-port | The RPC port for the controller | 5116 |
scheduler | The scheduler to use; one of process, kubernetes, node, or embedded | process |
Admin
Configuration for the Admin service
Key: admin
| Name | Description | Default Value |
|---|---|---|
bind-address | Address to bind the Admin service | 0.0.0.0 |
http-port | Port for the Admin HTTP service | 5114 |
Node
Configuration for the Node service
Key: node
| Name | Description | Default Value |
|---|---|---|
bind-address | Address to bind the Node service | 0.0.0.0 |
rpc-port | Port for the Node RPC service | 5118 |
task-slots | Number of task slots for the Node | 16 |
Worker
Configuration for pipeline workers
Key: worker
| Name | Description | Default Value |
|---|---|---|
bind-address | Address to bind the Worker service | 0.0.0.0 |
rpc-port | RPC port for the worker to listen on; set to 0 to use a random available port | 0 |
data-port | Data port for the worker to listen on; set to 0 to use a random available port | 0 |
task-slots | Number of task slots for the Worker | 16 |
queue-size | Size of the queues between nodes in the dataflow graph | 8192 |
Schedulers
Configuration for the various schedulers
Process Scheduler
Key: process-scheduler
| Name | Description | Default Value |
|---|---|---|
slots-per-process | Number of slots per process in the scheduler | 16 |
Kubernetes Scheduler
Key: kubernetes-scheduler
Some values for the kubernetes scheduler are complete Kubernetes object, for
example, the worker.resources object can be specified as a
Kubernetes resource object.
When specifying these via environment variables they should be encoded as Yaml.
See the Kubernetes deployment docs for more details.
There are two modes for allocating resources for Kubernetes, specified as the kubernetes-scheduler.resource-mode:
- In
per-slotmode, tasks are packed onto workers up to thetask-slotsconfig, and for each slot the amount of resources specified inresourcesis provided. This can be much more efficient for diversely-sized pipelines - In
per-podmode, every pod has exactlytask-slotsslots, and exactly the resources inresources, even if it is scheduled for fewer slots. This is the behavior from before 0.11.
| Name | Description | Default Value |
|---|---|---|
namespace | Kubernetes namespace for the scheduler | default |
resource-mode | Resource allocation mode; per-slot or per-pod | per-slot |
worker.name-prefix | Prefix for worker names | laminar |
worker.image | Docker image for workers | ghcr.io/laminarsystems/laminar:latest |
worker.image-pull-policy | Image pull policy for worker containers | IfNotPresent |
worker.service-account-name | Service account name for worker containers | default |
worker.resources.requests | Kubernetes resource object representing the requests for the worker pods | cpu: "900m", memory: "500Mi" |
worker.resources.limits | Kubernetes resource object representing the limits for the worker pods | none |
worker.task-slots | Number of task slots per worker | 16 |
worker.command | Command to start worker containers | /app/laminar worker |
worker.env | List of environment variables for worker containers, each a k8s-style map with name and value keys | none |
Database
Key: database
| Name | Description | Default Value |
|---|---|---|
type | Type of the database (either sqlite or postgres) | sqlite |
sqlite.path | Path to the database file | $(user config dir)/laminar/config.sqlite |
postgres.database-name | Name of the Postgres database | laminar |
postgres.host | Host of the Postgres database | localhost |
postgres.port | Port of the Postgres database | 5432 |
postgres.user | User for the Postgres database | laminar |
postgres.password | Password for the Postgres database | laminar |
Logging
Key: logging
| Name | Description | Default Value |
|---|---|---|
format | Set the log format (one of json, logfmt, or plaintext) | plaintext |
nonblocking | Whether to use nonblocking logging; this uses more memory but ensures processing is not blocked by a high rate of logging | false |
buffered-lines-limit | Number of lines to buffer before dropping logs or exerting backpressure on senders; only valid when nonblocking is set to true | 4096 |
enable-file-line | Whether to record the source file line in the log | false |
enable-file-name | Whether to record the source file name in the log | false |