Work in Progress: This page is under development. Use the feedback button on the bottom right to help us improve it.

Metrics

Laminar collects metrics from multiple sources and stores them in GrepTimeDB.

Collection Architecture

Desktop Mode

Vector scrapes Prometheus metrics directly from Laminar components:

Laminar Backend ──► Vector ──► GrepTimeDB
  - Controller (8004)
  - Workers (6901-6920)

Kubernetes Mode

Vector runs as a DaemonSet collecting metrics from multiple sources:

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│ Laminar Engine:8004 │──► Laminar-Metrics datasource
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
│ Vector host_metrics │──┐
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤  │
│ kubelet cAdvisor    │──┼──► GrepTimeDB-Metrics datasource
ā”œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¤  │
│ kube-state-metrics  ā”‚ā”€ā”€ā”˜
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

Metric Sources

SourceCollectorIntervalDatasource
Laminar enginePrometheus scrape15sLaminar-Metrics
Node metricsVector host_metrics15sGrepTimeDB-Metrics
Container metricskubelet cAdvisor15sGrepTimeDB-Metrics
K8s objectskube-state-metrics30sGrepTimeDB-Metrics

Laminar Engine Metrics

Scraped directly from laminar-engine:8004/metrics.

Controller Metrics

MetricTypeDescription
laminar_controller_active_pipelinesGaugeNumber of running pipelines
laminar_controller_registered_nodesGaugeNumber of registered worker nodes
laminar_controller_registered_slotsGaugeTotal slot count across all workers
laminar_controller_free_slotsGaugeAvailable (unused) slot count
laminar_controller_compaction_tuples_inCounterTuples received for compaction
laminar_controller_compaction_tuples_outCounterTuples emitted after compaction

Worker Metrics

MetricTypeLabelsDescription
laminar_worker_messages_recvCounternode_id, subtask_idx, operator_nameMessages received by operator
laminar_worker_messages_sentCounternode_id, subtask_idx, operator_nameMessages sent by operator
laminar_worker_bytes_recvCounternode_id, subtask_idx, operator_nameBytes received by operator
laminar_worker_bytes_sentCounternode_id, subtask_idx, operator_nameBytes sent by operator
laminar_worker_batches_recvCounternode_id, subtask_idx, operator_nameBatches received by operator
laminar_worker_batches_sentCounternode_id, subtask_idx, operator_nameBatches sent by operator
laminar_worker_deserialization_errorsCounternode_id, subtask_idx, operator_nameDeserialization error count

GrepTimeDB Metric Views

Pre-defined views in laminar_metric_views database for easier querying:

ViewDescription
active_pipelines_metricsPipeline count over time
slot_utilization_metricsSlot usage percentage (free/registered)
registered_nodes_metricsNode count over time
registered_slots_metricsTotal slots over time
free_slots_metricsAvailable slots over time
worker_messages_metricsMessages recv/sent per operator
worker_bytes_metricsBytes recv/sent per operator
worker_batches_metricsBatches recv/sent per operator
worker_errors_metricsDeserialization errors per operator
compaction_tuples_metricsCompaction in/out counts

Node Metrics

Collected by Vector's host_metrics source.

CPU

MetricDescription
node_cpu_seconds_totalCPU time in seconds
node_logical_cpusNumber of logical CPUs
node_physical_cpusNumber of physical CPUs
node_load11-minute load average
node_load55-minute load average
node_load1515-minute load average

Memory

MetricDescription
node_memory_total_bytesTotal memory
node_memory_free_bytesFree memory
node_memory_available_bytesAvailable memory
node_memory_used_bytesUsed memory
node_memory_cached_bytesCached memory
node_memory_buffers_bytesBuffer memory
node_memory_swap_total_bytesTotal swap
node_memory_swap_free_bytesFree swap

Disk

MetricDescription
node_disk_read_bytes_totalBytes read
node_disk_written_bytes_totalBytes written
node_disk_reads_completed_totalRead operations
node_disk_writes_completed_totalWrite operations

Filesystem

MetricDescription
node_filesystem_total_bytesFilesystem size
node_filesystem_free_bytesFree space
node_filesystem_used_bytesUsed space
node_filesystem_used_ratioUsage percentage

Network

MetricDescription
node_network_receive_bytes_totalBytes received
node_network_transmit_bytes_totalBytes transmitted
node_network_receive_packets_totalPackets received
node_network_transmit_packets_totalPackets transmitted
node_network_receive_errs_totalReceive errors
node_network_transmit_errs_totalTransmit errors

Container Metrics (cAdvisor)

Collected from kubelet's cAdvisor endpoint.

CPU

MetricDescription
container_cpu_usage_seconds_totalCPU usage in seconds
container_cpu_cfs_throttled_seconds_totalThrottled time
container_cpu_cfs_periods_totalCFS periods

Memory

MetricDescription
container_memory_usage_bytesCurrent memory usage
container_memory_working_set_bytesWorking set size
container_memory_rssRSS memory
container_memory_cacheCache memory
container_spec_memory_limit_bytesMemory limit

Network

MetricDescription
container_network_receive_bytes_totalBytes received
container_network_transmit_bytes_totalBytes transmitted
container_network_receive_errors_totalReceive errors
container_network_transmit_errors_totalTransmit errors

Kubernetes Object Metrics

Collected from kube-state-metrics.

Pods

MetricDescription
kube_pod_infoPod information
kube_pod_status_phasePod phase (Running, Pending, etc.)
kube_pod_status_readyPod ready status
kube_pod_container_status_restarts_totalContainer restart count
kube_pod_container_resource_requestsResource requests
kube_pod_container_resource_limitsResource limits

Deployments

MetricDescription
kube_deployment_status_replicasCurrent replicas
kube_deployment_status_replicas_availableAvailable replicas
kube_deployment_status_replicas_readyReady replicas
kube_deployment_spec_replicasDesired replicas

Nodes

MetricDescription
kube_node_infoNode information
kube_node_status_conditionNode conditions
kube_node_status_capacityNode capacity
kube_node_status_allocatableAllocatable resources

Querying Metrics

Grafana (Prometheus datasource)

# Slot utilization percentage
(laminar_controller_registered_slots - laminar_controller_free_slots)
  / laminar_controller_registered_slots * 100
 
# Worker message throughput
rate(laminar_worker_messages_recv[5m])
 
# Container CPU usage
rate(container_cpu_usage_seconds_total{namespace="tenant-e6data"}[5m])

GrepTimeDB SQL

-- Recent slot utilization
SELECT * FROM laminar_metric_views.slot_utilization_metrics
WHERE ts > NOW() - INTERVAL '1 hour'
ORDER BY ts DESC
LIMIT 100;
 
-- Worker throughput by operator
SELECT operator_name, SUM(messages_recv) as total_recv
FROM laminar_metric_views.worker_messages_metrics
WHERE ts > NOW() - INTERVAL '1 hour'
GROUP BY operator_name;