diff --git a/.superpowers/brainstorm/187997-1773926811/.server-stopped b/.superpowers/brainstorm/187997-1773926811/.server-stopped new file mode 100644 index 0000000000000..cbf5ed69c1462 --- /dev/null +++ b/.superpowers/brainstorm/187997-1773926811/.server-stopped @@ -0,0 +1 @@ +{"reason":"idle timeout","timestamp":1773995694913} diff --git a/.superpowers/brainstorm/187997-1773926811/.server.log b/.superpowers/brainstorm/187997-1773926811/.server.log new file mode 100644 index 0000000000000..5ef7180024405 --- /dev/null +++ b/.superpowers/brainstorm/187997-1773926811/.server.log @@ -0,0 +1,4 @@ +{"type":"server-started","port":53648,"host":"127.0.0.1","url_host":"localhost","url":"http://localhost:53648","screen_dir":"/home/ebenshet/go/src/github.com/stackrox/stackrox/.superpowers/brainstorm/187997-1773926811"} +{"type":"screen-added","file":"/home/ebenshet/go/src/github.com/stackrox/stackrox/.superpowers/brainstorm/187997-1773926811/code-patterns.html"} +{"type":"screen-added","file":"/home/ebenshet/go/src/github.com/stackrox/stackrox/.superpowers/brainstorm/187997-1773926811/architecture-map.html"} +{"type":"server-stopped","reason":"idle timeout"} diff --git a/.superpowers/brainstorm/187997-1773926811/.server.pid b/.superpowers/brainstorm/187997-1773926811/.server.pid new file mode 100644 index 0000000000000..35ab133c9805c --- /dev/null +++ b/.superpowers/brainstorm/187997-1773926811/.server.pid @@ -0,0 +1 @@ +188005 diff --git a/.superpowers/brainstorm/187997-1773926811/architecture-map.html b/.superpowers/brainstorm/187997-1773926811/architecture-map.html new file mode 100644 index 0000000000000..08a9e441da9a3 --- /dev/null +++ b/.superpowers/brainstorm/187997-1773926811/architecture-map.html @@ -0,0 +1,475 @@ + + + + +Central Data Processing Architecture + + + + +

Central Data Processing Architecture Map

Categories 1-3: Sensor Pipeline, Background Loops, and Enrichment/Detection paths

+ +

External (Sensor/Scanner)

Queue / Worker Pool

Pipeline Fragment

Enrichment

Detection / Alerts

Storage (Postgres)

Background Loop

has metrics

partial

no metrics

+ + +

Core Data Flow: Sensor → Central → Storage

+ +

Sensor (per cluster)

Bidirectional gRPC stream
MsgFromSensor / MsgToSensor
mTLS authenticated

partial: connect counter

→

Deduper

Hash-based dedup
per connection
Prevents reprocessing identical msgs

partial: passed/deduped counter

→

Worker Queue

FNV hash → poolSize+1 shards
Each shard = DedupingQueue + goroutine
Retry: 5 attempts, 5min backoff

partial: add/remove/dedupe counters only

→

Pipeline Router

Match() selects 1 of 25 fragments
safe.Run() wraps execution
Panic recovery per message

partial: panic counter, event duration

→

Fragment.Run()

Type-specific processing
Validates, enriches context
May trigger detection/risk

partial: resource_processed_count

→

PostgreSQL

250+ datastores
KeyFence for dedup
Connection pool

has metrics: op duration, table stats

+ + +

Pipeline Fragments (25 message type handlers)

Each processes a specific MsgFromSensor type. Colored tags show what side-effects they trigger.

+ +

deploymentevents

Deployment CRUD

→ Risk reprocess (async)

alerts

AlertResults

→ Detection lifecycle mgr

processindicators

ProcessIndicator

→ Detection lifecycle (queued)

networkflowupdate

NetworkFlowUpdate

→ Flow store + baselines (LOCAL)

nodes

Node CRUD

→ Vuln enrichment + Risk calc

nodeindex

IndexReport (v4)

→ Vuln enrichment + Risk calc

nodeinventory

NodeInventory (v2)

→ Vuln enrichment + Risk calc

podevents

Pod CRUD

Store only

namespaces

Namespace CRUD

Store + network graph epoch

networkpolicies

NetworkPolicy CRUD

Store + network graph epoch

secrets

Secret CRUD

Store only

roles

K8s Role CRUD

Store only

rolebindings

K8s RoleBinding CRUD

Store only

serviceaccounts

ServiceAccount CRUD

→ Risk reprocess (conditional)

imageintegrations

ImageIntegration

→ Enrichment + Detection loop

reprocessing

ReprocessDeployment

→ Risk manager

enhancements

DeploymentEnhancementResp

→ Enhancement broker

processlisteningonport

ProcessListeningOnPort

Store only

clustermetrics

ClusterMetrics

Prometheus gauges + telemetry

clusterstatusupdate

ClusterStatusUpdate

→ CVE fetcher trigger

clusterhealthupdate

ClusterHealthInfo

Health status computation

auditlogstateupdate

AuditLogStatusInfo

Store only

virtualmachines

VirtualMachine CRUD

Store only (feature-gated)

virtualmachineindex

VMIndexReport

→ Vuln enrichment (feature-gated)

+ + +

Enrichment & Detection Paths

Two paths to enrichment: inline (real-time from pipeline) and background (reprocessor loop)

+ +

Inline Enrichment (Pipeline-triggered)

+ Image Enrichment:
+ Pipeline → EnrichDeployment() → imageEnricher.EnrichImage()
+ Semaphore: internalScanSemaphore (configurable)
+ External: Scanner + Registry calls
+ scanWaiterManager deduplicates concurrent scans

+ Node Enrichment:
+ nodeindex/nodeinventory fragments → nodeEnricher.EnrichNode()
+ Vulnerability data merged into node

+ Context Enrichment:
+ Most fragments add cluster name/ID to resources +

partial: scan semaphore gauges only

+ +

Background Enrichment (Reprocessor)

+ Image Reprocessing (every 4h):
+ Query all images → semaphore(5) parallel enrichment
+ ForceRefetchCachedValuesOnly on tick
+ Send updated images back to Sensor clusters

+ Node Reprocessing:
+ All nodes → semaphore(5) parallel enrichment
+ Skip host-scanned nodes

+ Watched Images:
+ Separately reprocessed (may delegate to cluster)

+ Signature Verification:
+ Triggered by signal, re-verifies all image sigs +

partial: total duration gauge only

+ +

Build-time Detection

API-triggered only
Evaluates policies against image
Returns alerts synchronously

no pipeline metrics

Deploy-time Detection

Sensor sends AlertResults
Lifecycle mgr merges alerts
Alert create/update/resolve

partial: resource count

Runtime Detection

Process indicators queued
Batched flush every 1 min
Baseline generation/locking

partial: queue length gauge

+ +

Alert Manager & Risk Calculator

+ Alert Manager: Merges new alerts with existing → classifies as new/updated/resolved → dispatches notifications → triggers risk reprocess
+ Risk Manager: Scores deployments (image risk + violations + port exposure), images (vulns + age + components), nodes (vulns). Updates namespace/cluster rankings. +

partial: risk_processing_duration histogram

+ + +

Background Loops (19 loops, 30+ goroutines)

Only showing loops in Categories 1-3 scope. Loops marked with * are candidates for standard metrics.

+ +

* Reprocessor (enrichLoop)

Every 4h (ROX_REPROCESS_INTERVAL)

1 goroutine + semaphore(5) workers per batch

Reprocess all images, nodes, watched images. Resync deployments to sensors. Can be short-circuited.

partial: duration gauge

* Reprocessor (riskLoop)

Every ROX_RISK_REPROCESS_INTERVAL

1 goroutine

Drains deploymentRiskSet, sends risk reprocess messages to sensor connections.

no metrics

* Reprocessor (activeComponentLoop)

Every ROX_ACTIVE_VULN_REFRESH_INTERVAL

1 goroutine

Updates active (exploitable) component tracking.

no metrics

* CVE Suppress

Every 1h (hardcoded)

1 goroutine

Unsuppress CVEs with expired suppress state.

no metrics

* Pruning (GC)

Every ROX_PRUNE_INTERVAL

1 goroutine + semaphore(5) for network flows

Prune old alerts, orphaned pods/nodes, expired requests, stale baselines, log imbues, network flows.

partial: prune_duration histogram

* CVE Fetcher

ROX_ORCHESTRATOR_VULN_SCAN_INTERVAL

1 goroutine

Reconcile orchestrator + Istio CVEs. Throttled on cluster connection.

no metrics

* Lifecycle Mgr (indicator flush)

Every 1 min

1 goroutine

Flush queued process indicators to datastore. Batch indexing. Baseline evaluation.

partial: process_queue_length gauge

* Network Baseline Flush

Every 5s (hardcoded)

1 goroutine

Flush accumulated network baseline changes to storage.

no metrics

* Hash Manager Flush

ROX_HASH_FLUSH_INTERVAL

1 goroutine (optional)

Flush deduplication hashes to database.

partial: hash_size gauge

Conn Manager Health

Every 30s (hardcoded)

1 goroutine

Update inactive/active cluster health status.

no metrics

Vuln Request Manager

ROX_VULN_REQUEST_REVERT_TIMER_DURATION

3 goroutines

Expire timed deferrals, expire fixable CVE deferrals, unsuppress CVEs.

no metrics

Network Gatherer

ROX_EXT_NETWORK_SRCS_GATHER_INTERVAL

2 goroutines

Fetch default external networks, reconcile with storage.

no metrics

+ + +

Key Data Flow Connections

+ Sensor → Deduper → WorkerQueue(sharded) → Pipeline Router → Fragment.Run() → Datastore
+
+ deploymentevents fragment → stores deployment → queues to riskLoop → sends ReprocessDeploymentRisk back through pipeline
+ alerts fragment → lifecycle manager → alert manager (merge/notify) → risk reprocess
+ processindicators fragment → lifecycle manager indicator queue → flush every 1min → baseline evaluation
+ imageintegrations fragment → reprocessor.ShortCircuit() → triggers immediate enrichment cycle
+ nodes/nodeindex/nodeinventory → nodeEnricher.EnrichNode() → risk calc → store
+
+ enrichLoop (4h) → query all images → imageEnricher.EnrichImage() per image → risk calc → send updated images to Sensor
+ pruning (periodic) → query stale data → batch delete from Postgres
+ CVE suppress (1h) → query expired suppressions → unsuppress CVEs in Postgres +

+ +

Observability Gaps Summary

+ Worker Queue: No depth gauge, no worker utilization, no retry metrics, no in-flight count, no end-to-end latency
+ Background Loops: 7 of 12 loops have ZERO metrics. No "is running" flag, no batch size, no items processed/failed per run, no time-since-last-run
+ Enrichment: No call count by trigger source, no success/failure/skip rates, no cache hit rates
+ Detection: No alerts-generated-per-evaluation, no violations-found vs clean ratio, no baseline generation timing
+ Cross-cutting: No end-to-end trace from Sensor message arrival to final storage commit +

+ + + diff --git a/.superpowers/brainstorm/187997-1773926811/code-patterns.html b/.superpowers/brainstorm/187997-1773926811/code-patterns.html new file mode 100644 index 0000000000000..0b572fbaf7410 --- /dev/null +++ b/.superpowers/brainstorm/187997-1773926811/code-patterns.html @@ -0,0 +1,189 @@ +

Central's Three Module Archetypes - Real Code Examples

Each archetype has different concurrency patterns, different metric needs, and different levels of existing instrumentation

+ +

Archetype 1: Sensor Message Pipeline (Worker Queue)

central/sensor/service/connection/worker_queue.go

+// Hash-sharded deduping queues (poolSize+1 workers)
+type workerQueue struct {
+    poolSize  int
+    queues    []*dedupingqueue.DedupingQueue[string]  // each queue = 1 goroutine
+}
+
+// Push: FNV hash routes msg to a specific queue shard
+func (w *workerQueue) push(msg *central.MsgFromSensor) {
+    idx := w.indexFromKey(msg.GetHashKey())
+    w.queues[idx].Push(msg)  // NO depth gauge, NO push timing
+}
+
+// Workers: pull blocking, handle, retry on transient errors
+func (w *workerQueue) runWorker(...) {
+    for msg := queue.PullBlocking(stopSig); msg != nil; ... {
+        err := handler(ctx, msgFromSensor)
+        if pgutils.IsTransientError(err) {
+            // Retry with backoff - NO retry count metric
+            concurrency.AfterFunc(reprocessingDuration, ...)
+        }
+    }
+}

Existing Metrics

sensor_event_queue - CounterVec (Add/Remove/Dedupe ops)
sensor_event_duration - Histogram (processing time per type)
sensor_event_deduper - Counter (passed/deduped)
pipeline_panics - Counter per resource type

Missing Metrics

Queue depth gauge (how deep is each shard?)
Worker utilization (busy vs idle time)
Retry count / permanent failure count
Per-shard processing rate
Items in-flight (being processed right now)
End-to-end latency (enqueue to completion)

+ +

Archetype 2: Background Ticker Loop (Reprocessor)

central/reprocessor/reprocessor.go

+// Three independent ticker loops, each in its own goroutine
+func (l *loopImpl) Start() {
+    l.enrichAndDetectTicker = time.NewTicker(4 * time.Hour)
+    l.deploymentRiskTicker  = time.NewTicker(riskInterval)
+    l.activeComponentTicker = time.NewTicker(activeInterval)
+    go l.riskLoop()
+    go l.enrichLoop()
+    go l.activeComponentLoop()
+}
+
+// enrichLoop: select on stop, short-circuit, signature, ticker
+func (l *loopImpl) enrichLoop() {
+    for !l.stopSig.IsDone() {
+        select {
+        case <-l.enrichAndDetectTicker.C:
+            l.runReprocessing(ForceRefetchCachedValuesOnly)
+            // Only tracks total duration, not per-entity
+        }
+    }
+}
+
+// Semaphore-limited concurrent processing (5 goroutines max)
+func (l *loopImpl) runReprocessingForObjects(...) {
+    sema := semaphore.NewWeighted(5)
+    for _, id := range ids {
+        go func(id string) {
+            defer sema.Release(1)
+            individualReprocessFunc(id) // NO per-item timing
+        }(id)
+    }
+}

Existing Metrics

reprocessor_duration_seconds - Gauge (total loop time)
signature_verification_reprocessor_duration_seconds - Gauge
risk_processing_duration - Histogram

Missing Metrics

Is the loop currently running? (boolean gauge)
How many items in current batch?
Items processed / failed per run
Semaphore utilization (how many of 5 slots used?)
Time since last successful run
Individual entity processing duration histogram
Number of runs completed (counter)

+ +

Archetype 3: Enrichment/Detection (Request-Driven Processing)

central/enrichment/enricher_impl.go + central/detection/lifecycle/manager_impl.go

+// Enrichment: called inline from pipeline AND from reprocessor
+// Two different call paths, same underlying function
+
+// Path 1: Pipeline fragment triggers enrichment synchronously
+func (p *pipelineImpl) Run(ctx, clusterID, msg) {
+    deployment := msg.GetEvent().GetDeployment()
+    images, _, _ := e.enricher.EnrichDeployment(ctx, deployment)
+    // NO metrics on enrichment call count or success rate
+    detector.ProcessDeployment(ctx, deployment, images)
+    // NO metrics on detection outcome counts
+}
+
+// Path 2: Reprocessor triggers enrichment in background
+func (l *loopImpl) reprocessImage(id string, ...) {
+    result, err := reprocessingFunc(ctx, enrichCtx, image)
+    if result.ImageUpdated {
+        l.risk.CalculateRiskAndUpsertImage(image)
+    }
+    // Only counts nReprocessed at batch level
+}
+
+// Image scan semaphore limits concurrent scanner calls
+internalScanSemaphore = semaphore.NewWeighted(maxParallelScans)
+// This one HAS metrics: rox_image_scan_semaphore_*

Existing Metrics

image_scan_semaphore_holding_size - active scans
image_scan_semaphore_queue_size - queued scans
deployment_enhancement_duration_ms - round trip time
resource_processed_count - per resource type

Missing Metrics

Enrichment call count (by trigger: pipeline vs reprocessor)
Enrichment success/failure/skip rate
Detection: alerts generated per policy evaluation
Detection: violations found vs clean deployments
Risk calculation duration per entity type
Cache hit rate for enrichment data

+ +

Key Insight: The DedupingQueue Already Has Metric Hooks

Both pkg/queue/Queue[T] and pkg/dedupingqueue/DedupingQueue[K] already accept optional Prometheus metrics via functional options:

+// Queue[T] accepts:
+WithCounterVec[T](vec *prometheus.CounterVec)  // tracks Add/Remove
+WithDroppedMetric[T](metric prometheus.Counter) // tracks drops when full
+
+// DedupingQueue[K] accepts:
+WithSizeMetrics[K](metric prometheus.Gauge)              // tracks queue depth
+WithOperationMetricsFunc[K](fn func(ops.Op, string))     // tracks Add/Remove/Dedupe
+
+// But NO built-in support for:
+// - Processing duration histograms
+// - In-flight item counts
+// - Error/retry counters
+// - Worker utilization gauges

+ \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-api-ui.json b/deploy/charts/monitoring/dashboards/central-api-ui.json new file mode 100644 index 0000000000000..d2767b5e6aabd --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-api-ui.json @@ -0,0 +1,497 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "GraphQL", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_graphql_query_duration_bucket[5m]))", + "instant": false, + "legendFormat": "{{Query}}", + "range": true, + "refId": "A" + } + ], + "title": "Query Duration p95", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_graphql_op_duration_bucket[5m]))", + "instant": false, + "legendFormat": "{{Resolver}}", + "range": true, + "refId": "A" + } + ], + "title": "Resolver Duration p95", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 4, + "panels": [], + "title": "gRPC", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 10 + }, + "id": 5, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_grpc_error[5m])", + "instant": false, + "legendFormat": "{{Code}}", + "range": true, + "refId": "A" + } + ], + "title": "gRPC Errors", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 10 + }, + "id": 6, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_grpc_message_size_sent_bytes", + "instant": false, + "legendFormat": "{{Type}}", + "range": true, + "refId": "A" + } + ], + "title": "Message Sizes", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 18 + }, + "id": 7, + "panels": [], + "title": "Gaps", + "type": "row" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 0, + "y": 19 + }, + "id": 8, + "options": { + "content": "⚠️ **Metric Needed**: No per-API-endpoint latency or error rate metrics. Need: `central_api_request_duration_seconds{method,endpoint}`, `central_api_requests_total{method,endpoint,status}`. Cannot answer \"which API endpoint is slow?\"", + "mode": "markdown" + }, + "title": "GAP: Per-Endpoint Metrics", + "type": "text" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 12, + "y": 19 + }, + "id": 9, + "options": { + "content": "⚠️ **Metric Needed**: No UI page load metrics. Need frontend instrumentation or backend per-page-load latency tracking.", + "mode": "markdown" + }, + "title": "GAP: UI Page Load", + "type": "text" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "api-ui" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: API \u0026 UI", + "uid": "central-api-ui", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-background-reprocessing.json b/deploy/charts/monitoring/dashboards/central-background-reprocessing.json new file mode 100644 index 0000000000000..011b904d32831 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-background-reprocessing.json @@ -0,0 +1,283 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Reprocessor Loops", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_reprocessor_duration_seconds", + "instant": false, + "legendFormat": "duration", + "range": true, + "refId": "A" + } + ], + "title": "Reprocessor Duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_signature_verification_reprocessor_duration_seconds", + "instant": false, + "legendFormat": "duration", + "range": true, + "refId": "A" + } + ], + "title": "Sig Verification Duration", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 4, + "panels": [], + "title": "Gaps — Loop Instrumentation", + "type": "row" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 6, + "w": 24, + "x": 0, + "y": 10 + }, + "id": 5, + "options": { + "content": "⚠️ **Metric Needed**: 19+ background loops lack standard metrics. Need per-loop: `_running` (gauge), `_runs_total{result}` (counter), `_run_duration_seconds` (histogram), `_items_processed_total` (counter), `_last_run_timestamp_seconds` (gauge). Loops include: image-enrich, deployment-risk, active-components, pruning, CVE-suppress, CVE-fetch, indicator-flush, network-baseline-flush, hash-flush, conn-health, vuln-request, network-gatherer, and more.", + "mode": "markdown" + }, + "title": "GAP: Background Loop Metrics", + "type": "text" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "background-reprocessing" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Background Reprocessing", + "uid": "central-background-reprocessing", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-deployment-processing.json b/deploy/charts/monitoring/dashboards/central-deployment-processing.json new file mode 100644 index 0000000000000..96808d3e26c61 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-deployment-processing.json @@ -0,0 +1,372 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Resource Processing", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_resource_processed_count[5m])", + "instant": false, + "legendFormat": "{{Resource}} - {{Operation}}", + "range": true, + "refId": "A" + } + ], + "title": "Resources/sec", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_k8s_event_processing_duration_bucket[5m]))", + "instant": false, + "legendFormat": "p95 {{Resource}} - {{Action}}", + "range": true, + "refId": "A" + } + ], + "title": "K8s Event Duration", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 4, + "panels": [], + "title": "Store Operations", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 10 + }, + "id": 5, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_postgres_op_duration_bucket{Type=~\"deployments|pods|namespaces\"}[5m]))", + "instant": false, + "legendFormat": "p95", + "range": true, + "refId": "A" + } + ], + "title": "Postgres Op Duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 12, + "y": 10 + }, + "id": 6, + "options": { + "content": "⚠️ **Metric Needed**: No per-fragment handler metrics. Cannot distinguish processing time for deployment vs pod vs namespace fragments.", + "mode": "markdown" + }, + "title": "GAP: Per-Fragment Handler Metrics", + "type": "text" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "deployment-processing" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Deployment Processing", + "uid": "central-deployment-processing", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-detection-alerts.json b/deploy/charts/monitoring/dashboards/central-detection-alerts.json new file mode 100644 index 0000000000000..76448b2070857 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-detection-alerts.json @@ -0,0 +1,213 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Detection", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_process_filter", + "instant": false, + "legendFormat": "{{Type}}", + "range": true, + "refId": "A" + } + ], + "title": "Process Filter", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 12, + "y": 1 + }, + "id": 3, + "options": { + "content": "⚠️ **Metric Needed**: `central_detection_alerts_generated_total` — No alert generation rate metric. Cannot answer \"how many alerts are being generated?\"", + "mode": "markdown" + }, + "title": "GAP: Alert Generation Rate", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 4, + "panels": [], + "title": "Gaps", + "type": "row" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 24, + "x": 0, + "y": 10 + }, + "id": 5, + "options": { + "content": "⚠️ **Metric Needed**: No lifecycle manager metrics. Need: `central_detection_lifecycle_duration_seconds`, `central_detection_baseline_evaluations_total`", + "mode": "markdown" + }, + "title": "GAP: Lifecycle Manager Metrics", + "type": "text" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "detection-alerts" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Detection \u0026 Alerts", + "uid": "central-detection-alerts", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-internals.json b/deploy/charts/monitoring/dashboards/central-internals.json new file mode 100644 index 0000000000000..bde5b62f56202 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-internals.json @@ -0,0 +1,2347 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← StackRox Overview", + "tooltip": "", + "type": "link", + "url": "/d/stackrox-overview" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Sensor Ingestion", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_sensor_event_queue[5m])", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "events/sec", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 7, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_sensor_event_deduper[5m])", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "deduper", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 14, + "y": 1 + }, + "id": 4, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_sensor_event_duration_bucket[5m]))", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "processing latency p95", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 3, + "x": 21, + "y": 1 + }, + "id": 5, + "options": { + "content": "⚠️ ### [→ Details](/d/central-sensor-ingestion)\n\nDrill into Sensor Ingestion metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 6, + "panels": [], + "title": "Deployment Processing", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 0, + "y": 10 + }, + "id": 7, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_resource_processed_count[5m])", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "resources/sec", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 8, + "y": 10 + }, + "id": 8, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_k8s_event_processing_duration_bucket[5m]))", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "K8s event latency", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 16, + "y": 10 + }, + "id": 9, + "options": { + "content": "⚠️ ### [→ Details](/d/central-deployment-processing)\n\nDrill into Deployment Processing metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 18 + }, + "id": 10, + "panels": [], + "title": "Vulnerability Enrichment", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 0, + "y": 19 + }, + "id": 11, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_image_scan_semaphore_holding_size", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "scans in-flight", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 6, + "y": 19 + }, + "id": 12, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_scan_duration_bucket[5m]))", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "scan duration p95", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 13, + "y": 19 + }, + "id": 13, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_image_scan_semaphore_queue_size", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "queue waiting", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 4, + "x": 20, + "y": 19 + }, + "id": 14, + "options": { + "content": "⚠️ ### [→ Details](/d/central-vuln-enrichment)\n\nDrill into Vulnerability Enrichment metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 27 + }, + "id": 15, + "panels": [], + "title": "Detection \u0026 Alerts", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 0, + "y": 28 + }, + "id": 16, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_process_filter", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "process filter", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 10, + "y": 28 + }, + "id": 17, + "options": { + "content": "⚠️ ⚠️ No alert generation rate metric available", + "mode": "markdown" + }, + "title": "Alert Generation Rate", + "type": "text" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 4, + "x": 20, + "y": 28 + }, + "id": 18, + "options": { + "content": "⚠️ ### [→ Details](/d/central-detection-alerts)\n\nDrill into Detection \u0026 Alerts metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 36 + }, + "id": 19, + "panels": [], + "title": "Risk Calculation", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 0, + "y": 37 + }, + "id": 20, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_risk_processing_duration", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "risk duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 10, + "y": 37 + }, + "id": 21, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_reprocessor_duration_seconds", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "reprocessor", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 4, + "x": 20, + "y": 37 + }, + "id": 22, + "options": { + "content": "⚠️ ### [→ Details](/d/central-risk-calculation)\n\nDrill into Risk Calculation metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 45 + }, + "id": 23, + "panels": [], + "title": "Background Reprocessing", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 0, + "y": 46 + }, + "id": 24, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_reprocessor_duration_seconds", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "reprocessor duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 7, + "y": 46 + }, + "id": 25, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_signature_verification_reprocessor_duration_seconds", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "sig verification", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 14, + "y": 46 + }, + "id": 26, + "options": { + "content": "⚠️ ⚠️ No running/items-processed metrics available", + "mode": "markdown" + }, + "title": "Running/Items Processed", + "type": "text" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 3, + "x": 21, + "y": 46 + }, + "id": 27, + "options": { + "content": "⚠️ ### [→ Details](/d/central-background-reprocessing)\n\nDrill into Background Reprocessing metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 54 + }, + "id": 28, + "panels": [], + "title": "Pruning \u0026 GC", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 0, + "y": 55 + }, + "id": 29, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_prune_duration", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "prune duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 7, + "y": 55 + }, + "id": 30, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_process_queue_length", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "process queue", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 7, + "x": 14, + "y": 55 + }, + "id": 31, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_pruned_process_indicators[5m])", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "pruned indicators", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 3, + "x": 21, + "y": 55 + }, + "id": 32, + "options": { + "content": "⚠️ ### [→ Details](/d/central-pruning-gc)\n\nDrill into Pruning \u0026 GC metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 63 + }, + "id": 33, + "panels": [], + "title": "Network Analysis", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 0, + "y": 64 + }, + "id": 34, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_total_network_flows_central_received_counter[5m])", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "flows received", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 10, + "y": 64 + }, + "id": 35, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_total_network_endpoints_received_counter[5m])", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "endpoints received", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 4, + "x": 20, + "y": 64 + }, + "id": 36, + "options": { + "content": "⚠️ ### [→ Details](/d/central-network-analysis)\n\nDrill into Network Analysis metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 72 + }, + "id": 37, + "panels": [], + "title": "Report Generation", + "type": "row" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 0, + "y": 73 + }, + "id": 38, + "options": { + "content": "⚠️ ⚠️ No Central-side report generation metrics exist", + "mode": "markdown" + }, + "title": "Central-side Reports", + "type": "text" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 10, + "y": 73 + }, + "id": 39, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_complianceoperator_scan_watchers_current", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "compliance watchers", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 4, + "x": 20, + "y": 73 + }, + "id": 40, + "options": { + "content": "⚠️ ### [→ Details](/d/central-report-generation)\n\nDrill into Report Generation metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 81 + }, + "id": 41, + "panels": [], + "title": "API \u0026 UI", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 0, + "y": 82 + }, + "id": 42, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_graphql_query_duration_bucket[5m]))", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "GraphQL p95", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 10, + "x": 10, + "y": 82 + }, + "id": 43, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_grpc_error[5m])", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "gRPC errors", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 8, + "w": 4, + "x": 20, + "y": 82 + }, + "id": 44, + "options": { + "content": "⚠️ ### [→ Details](/d/central-api-ui)\n\nDrill into API \u0026 UI metrics", + "mode": "markdown" + }, + "title": "", + "type": "text" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-2" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central Internals", + "uid": "central-internals", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-network-analysis.json b/deploy/charts/monitoring/dashboards/central-network-analysis.json new file mode 100644 index 0000000000000..ec28f47531eb2 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-network-analysis.json @@ -0,0 +1,283 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Flows \u0026 Endpoints", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_total_network_flows_central_received_counter[5m])", + "instant": false, + "legendFormat": "{{ClusterID}}", + "range": true, + "refId": "A" + } + ], + "title": "Flows Received", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_total_network_endpoints_received_counter[5m])", + "instant": false, + "legendFormat": "{{ClusterID}}", + "range": true, + "refId": "A" + } + ], + "title": "Endpoints Received", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 4, + "panels": [], + "title": "Gaps", + "type": "row" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 24, + "x": 0, + "y": 10 + }, + "id": 5, + "options": { + "content": "⚠️ **Metric Needed**: Network baseline flush, external network gatherer, and flow processing pipeline have no Central-side metrics. Need: `central_network_baseline_flush_duration_seconds`, `central_network_flows_processed_total{action}`", + "mode": "markdown" + }, + "title": "GAP: Network Processing Pipeline", + "type": "text" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "network-analysis" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Network Analysis", + "uid": "central-network-analysis", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-pruning-gc.json b/deploy/charts/monitoring/dashboards/central-pruning-gc.json new file mode 100644 index 0000000000000..325e3916d1bc4 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-pruning-gc.json @@ -0,0 +1,543 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Pruning", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_prune_duration", + "instant": false, + "legendFormat": "{{Type}}", + "range": true, + "refId": "A" + } + ], + "title": "Prune Duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 8, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_process_queue_length", + "instant": false, + "legendFormat": "queue length", + "range": true, + "refId": "A" + } + ], + "title": "Process Queue Length", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 16, + "y": 1 + }, + "id": 4, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_pruned_process_indicators[5m])", + "instant": false, + "legendFormat": "pruned/sec", + "range": true, + "refId": "A" + } + ], + "title": "Pruned Indicators", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 5, + "panels": [], + "title": "Additional Metrics", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 10 + }, + "id": 6, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_orphaned_plop_total[5m])", + "instant": false, + "legendFormat": "{{ClusterID}}", + "range": true, + "refId": "A" + } + ], + "title": "Orphaned PLOPs", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 10 + }, + "id": 7, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_process_pruning_cache_hits[5m])", + "instant": false, + "legendFormat": "hits", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_process_pruning_cache_misses[5m])", + "instant": false, + "legendFormat": "misses", + "range": true, + "refId": "B" + } + ], + "title": "Cache Hits/Misses", + "type": "timeseries" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "pruning-gc" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Pruning \u0026 GC", + "uid": "central-pruning-gc", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-report-generation.json b/deploy/charts/monitoring/dashboards/central-report-generation.json new file mode 100644 index 0000000000000..10a2079366aa1 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-report-generation.json @@ -0,0 +1,478 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Compliance Operator Reports", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_complianceoperator_scan_watchers_current", + "instant": false, + "legendFormat": "watchers", + "range": true, + "refId": "A" + } + ], + "title": "Scan Watchers", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 8, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_complianceoperator_num_scans_running_in_parallel", + "instant": false, + "legendFormat": "parallel scans", + "range": true, + "refId": "A" + } + ], + "title": "Parallel Scans", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 16, + "y": 1 + }, + "id": 4, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_complianceoperator_scan_watchers_active_time_minutes", + "instant": false, + "legendFormat": "active time", + "range": true, + "refId": "A" + } + ], + "title": "Watcher Active Time", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 5, + "panels": [], + "title": "Watcher Outcomes", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 10 + }, + "id": 6, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_complianceoperator_scan_watchers_finish_type_total[5m])", + "instant": false, + "legendFormat": "{{type}}", + "range": true, + "refId": "A" + } + ], + "title": "Finish Types", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 18 + }, + "id": 7, + "panels": [], + "title": "Gaps — Vulnerability Report Pipeline", + "type": "row" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 24, + "x": 0, + "y": 19 + }, + "id": 8, + "options": { + "content": "⚠️ **Metric Needed**: No metrics for Central's vulnerability report generation pipeline (report scheduler, PDF/CSV generation, email delivery). Need: `central_report_generation_total{type,result}`, `central_report_generation_duration_seconds`, `central_report_delivery_total{method,result}`", + "mode": "markdown" + }, + "title": "GAP: Report Generation Pipeline", + "type": "text" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "report-generation" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Report Generation", + "uid": "central-report-generation", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-risk-calculation.json b/deploy/charts/monitoring/dashboards/central-risk-calculation.json new file mode 100644 index 0000000000000..3fdcd323f540d --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-risk-calculation.json @@ -0,0 +1,283 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Risk Processing", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_risk_processing_duration", + "instant": false, + "legendFormat": "{{Risk_Reprocessor}}", + "range": true, + "refId": "A" + } + ], + "title": "Risk Duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_reprocessor_duration_seconds", + "instant": false, + "legendFormat": "duration", + "range": true, + "refId": "A" + } + ], + "title": "Reprocessor Duration", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 4, + "panels": [], + "title": "Gaps", + "type": "row" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 24, + "x": 0, + "y": 10 + }, + "id": 5, + "options": { + "content": "⚠️ **Metric Needed**: `central_risk_items_processed_total` — No items-processed counter. Cannot answer \"how many deployments had risk recalculated?\"", + "mode": "markdown" + }, + "title": "GAP: Items Processed", + "type": "text" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "risk-calculation" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Risk Calculation", + "uid": "central-risk-calculation", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-sensor-ingestion.json b/deploy/charts/monitoring/dashboards/central-sensor-ingestion.json new file mode 100644 index 0000000000000..2b565d9163823 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-sensor-ingestion.json @@ -0,0 +1,1263 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Connection Status", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "count(rox_central_sensor_connected{connection_state=\"connected\"})", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Sensors Connected", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 16, + "x": 8, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_sensor_connected[5m])", + "instant": false, + "legendFormat": "{{connection_state}}", + "range": true, + "refId": "A" + } + ], + "title": "Connection Events", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 4, + "panels": [], + "title": "Deduper", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 10 + }, + "id": 5, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_sensor_event_deduper[5m])", + "instant": false, + "legendFormat": "{{status}} - {{type}}", + "range": true, + "refId": "A" + } + ], + "title": "Deduper Throughput", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 10 + }, + "id": 6, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_sensor_event_deduper{status=\"deduplicated\"}[5m]) / rate(rox_central_sensor_event_deduper[5m])", + "instant": false, + "legendFormat": "dedup rate", + "range": true, + "refId": "A" + } + ], + "title": "Deduper Hit Rate", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 18 + }, + "id": 7, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_deduping_hash_size", + "instant": false, + "legendFormat": "{{cluster}}", + "range": true, + "refId": "A" + } + ], + "title": "Hash Store Size", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 18 + }, + "id": 8, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_deduping_hash_count[5m])", + "instant": false, + "legendFormat": "{{ResourceType}} - {{Operation}}", + "range": true, + "refId": "A" + } + ], + "title": "Hash Operations", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 26 + }, + "id": 9, + "panels": [], + "title": "Worker Queue", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 27 + }, + "id": 10, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_sensor_event_queue[5m])", + "instant": false, + "legendFormat": "{{Operation}} - {{Type}}", + "range": true, + "refId": "A" + } + ], + "title": "Events Processed", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 27 + }, + "id": 11, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_sensor_event_duration_bucket[5m]))", + "instant": false, + "legendFormat": "p95 - {{Type}}", + "range": true, + "refId": "A" + } + ], + "title": "Processing Duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 0, + "y": 31 + }, + "id": 12, + "options": { + "content": "⚠️ **Metric Needed**: `central_sensor_ingestion_queue_depth` — No gauge exists for worker queue shard depth. Cannot answer \"is the queue backing up?\"", + "mode": "markdown" + }, + "title": "GAP: Queue Depth", + "type": "text" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 12, + "y": 31 + }, + "id": 13, + "options": { + "content": "⚠️ **Metric Needed**: `central_sensor_ingestion_in_flight` — No gauge for items currently being processed per shard.", + "mode": "markdown" + }, + "title": "GAP: In-Flight", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 39 + }, + "id": 14, + "panels": [], + "title": "Pipeline Processing", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 40 + }, + "id": 15, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_resource_processed_count[5m])", + "instant": false, + "legendFormat": "{{Resource}} - {{Operation}}", + "range": true, + "refId": "A" + } + ], + "title": "Resources Processed", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 40 + }, + "id": 16, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_pipeline_panics[5m])", + "instant": false, + "legendFormat": "{{resource}}", + "range": true, + "refId": "A" + } + ], + "title": "Pipeline Panics", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 48 + }, + "id": 17, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_k8s_event_processing_duration_bucket[5m]))", + "instant": false, + "legendFormat": "p95 - {{Resource}}", + "range": true, + "refId": "A" + } + ], + "title": "K8s Event Processing", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 12, + "y": 48 + }, + "id": 18, + "options": { + "content": "⚠️ **Metric Needed**: Per-fragment processing counts and durations. 25 pipeline fragments exist but none have individual metrics.", + "mode": "markdown" + }, + "title": "GAP: Per-Fragment Metrics", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 56 + }, + "id": 19, + "panels": [], + "title": "Messages Not Sent", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 24, + "x": 0, + "y": 57 + }, + "id": 20, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_msg_to_sensor_not_sent_count[5m])", + "instant": false, + "legendFormat": "{{type}} - {{reason}}", + "range": true, + "refId": "A" + } + ], + "title": "Failed Sends to Sensor", + "type": "timeseries" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "sensor-ingestion" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Sensor Ingestion", + "uid": "central-sensor-ingestion", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/central-vuln-enrichment.json b/deploy/charts/monitoring/dashboards/central-vuln-enrichment.json new file mode 100644 index 0000000000000..316b2a83c85c0 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/central-vuln-enrichment.json @@ -0,0 +1,1282 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "← Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Scan Semaphore", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "sum(rox_image_scan_semaphore_holding_size)", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Scans In-Flight", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 8, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_image_scan_semaphore_holding_size", + "instant": false, + "legendFormat": "holding", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_image_scan_semaphore_limit", + "instant": false, + "legendFormat": "limit", + "range": true, + "refId": "B" + } + ], + "title": "Semaphore Utilization", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 16, + "y": 1 + }, + "id": 4, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_image_scan_semaphore_queue_size", + "instant": false, + "legendFormat": "{{subsystem}} - {{entity}}", + "range": true, + "refId": "A" + } + ], + "title": "Queue Waiting", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 5, + "panels": [], + "title": "Image Scanning", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 10 + }, + "id": 6, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.5, rate(rox_central_scan_duration_bucket[5m]))", + "instant": false, + "legendFormat": "p50", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_scan_duration_bucket[5m]))", + "instant": false, + "legendFormat": "p95", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, rate(rox_central_scan_duration_bucket[5m]))", + "instant": false, + "legendFormat": "p99", + "range": true, + "refId": "C" + } + ], + "title": "Scan Duration p50/p95/p99", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 10 + }, + "id": 7, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_image_vuln_retrieval_duration_bucket[5m]))", + "instant": false, + "legendFormat": "p95", + "range": true, + "refId": "A" + } + ], + "title": "Vuln Retrieval Duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "percentunit" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 18 + }, + "id": 8, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_metadata_cache_hits[5m]) / (rate(rox_central_metadata_cache_hits[5m]) + rate(rox_central_metadata_cache_misses[5m]))", + "instant": false, + "legendFormat": "hit rate", + "range": true, + "refId": "A" + } + ], + "title": "Metadata Cache Hit Rate", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 12, + "y": 18 + }, + "id": 9, + "options": { + "content": "⚠️ **Metric Needed**: `central_vuln_enrichment_requests_total{type,result}` — No counter for total enrichment requests (inline vs background). Cannot calculate enrichment failure rate.", + "mode": "markdown" + }, + "title": "GAP: Enrichment Calls", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 26 + }, + "id": 10, + "panels": [], + "title": "Node Scanning", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 27 + }, + "id": 11, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_node_scan_duration_bucket[5m]))", + "instant": false, + "legendFormat": "p95", + "range": true, + "refId": "A" + } + ], + "title": "Node Scan Duration", + "type": "timeseries" + }, + { + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 4, + "w": 12, + "x": 12, + "y": 27 + }, + "id": 12, + "options": { + "content": "⚠️ **Metric Needed**: `central_vuln_enrichment_node_scans_total{result}` — No counter for total node scans.", + "mode": "markdown" + }, + "title": "GAP: Node Scan Count", + "type": "text" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 35 + }, + "id": 13, + "panels": [], + "title": "Image Deduplication", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 36 + }, + "id": 14, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_image_upsert_deduper[5m])", + "instant": false, + "legendFormat": "{{status}}", + "range": true, + "refId": "A" + } + ], + "title": "Image Upsert Deduper", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 36 + }, + "id": 15, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_deployment_enhancement_duration_ms", + "instant": false, + "legendFormat": "duration", + "range": true, + "refId": "A" + } + ], + "title": "Deployment Enhancement", + "type": "timeseries" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 44 + }, + "id": 16, + "panels": [], + "title": "Registry Client", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 0, + "y": 45 + }, + "id": 17, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_registry_client_requests_total[5m])", + "instant": false, + "legendFormat": "{{code}} - {{type}}", + "range": true, + "refId": "A" + } + ], + "title": "Registry Requests", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "s" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 8, + "y": 45 + }, + "id": 18, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, rate(rox_central_registry_client_request_duration_seconds_bucket[5m]))", + "instant": false, + "legendFormat": "p95", + "range": true, + "refId": "A" + } + ], + "title": "Registry Latency", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 8, + "x": 16, + "y": 45 + }, + "id": 19, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(rox_central_registry_client_error_timeouts_total[5m])", + "instant": false, + "legendFormat": "timeouts", + "range": true, + "refId": "A" + } + ], + "title": "Registry Timeouts", + "type": "timeseries" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "central", + "level-3", + "vulnerability-enrichment" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "Central: Vulnerability Enrichment", + "uid": "central-vuln-enrichment", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/dashboards/generator/dashboard.go b/deploy/charts/monitoring/dashboards/generator/dashboard.go new file mode 100644 index 0000000000000..881539c6980e4 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/dashboard.go @@ -0,0 +1,191 @@ +package generator + +// Dashboard represents a Grafana dashboard +type Dashboard struct { + UID string + Title string + Tags []string + Links []DashboardLink // Links to other dashboards + Rows []Row + Templating []Variable +} + +// DashboardLink represents a link to another dashboard +type DashboardLink struct { + Title string + TargetUID string + Type string // "link" or "dashboards" +} + +// Variable represents a dashboard template variable +type Variable struct { + Name string + Type string // "datasource", "query", "custom" + Query string + Label string +} + +// Row is a collapsible row of panels +type Row struct { + Title string + Panels []Panel +} + +// Generate produces a map[string]any that marshals to valid Grafana JSON +func (d *Dashboard) Generate() map[string]any { + result := map[string]any{ + "uid": d.UID, + "title": d.Title, + "tags": d.Tags, + "editable": true, + "schemaVersion": 27, + "version": 0, + "refresh": "30s", + "time": map[string]any{ + "from": "now-6h", + "to": "now", + }, + "timepicker": map[string]any{}, + "timezone": "", + "annotations": map[string]any{ + "list": []map[string]any{ + { + "builtIn": 1, + "datasource": map[string]any{ + "type": "grafana", + "uid": "-- Grafana --", + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "type": "dashboard", + }, + }, + }, + "links": d.generateLinks(), + "panels": d.generatePanels(), + } + + // Add templating if present + if len(d.Templating) > 0 { + result["templating"] = d.generateTemplating() + } + + return result +} + +func (d *Dashboard) generateLinks() []map[string]any { + if len(d.Links) == 0 { + return []map[string]any{} + } + + links := make([]map[string]any, 0, len(d.Links)) + for _, link := range d.Links { + linkType := link.Type + if linkType == "" { + linkType = "link" + } + + l := map[string]any{ + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": []string{}, + "targetBlank": false, + "title": link.Title, + "tooltip": "", + "type": linkType, + "url": "/d/" + link.TargetUID, + } + links = append(links, l) + } + + return links +} + +func (d *Dashboard) generateTemplating() map[string]any { + list := make([]map[string]any, 0, len(d.Templating)) + + for _, v := range d.Templating { + variable := map[string]any{ + "name": v.Name, + "type": v.Type, + "label": v.Label, + } + + if v.Query != "" { + variable["query"] = v.Query + } + + // Add common fields based on type + if v.Type == "datasource" { + variable["query"] = "prometheus" + variable["current"] = map[string]any{ + "selected": false, + "text": "Prometheus", + "value": "Prometheus", + } + } + + list = append(list, variable) + } + + return map[string]any{ + "list": list, + } +} + +func (d *Dashboard) generatePanels() []map[string]any { + panels := []map[string]any{} + panelID := 1 + yPos := 0 + + for _, row := range d.Rows { + // Add row header + rowPanel := map[string]any{ + "collapsed": false, + "datasource": map[string]any{ + "type": "datasource", + "uid": "grafana", + }, + "gridPos": map[string]int{ + "h": 1, + "w": 24, + "x": 0, + "y": yPos, + }, + "id": panelID, + "panels": []any{}, + "title": row.Title, + "type": "row", + } + panels = append(panels, rowPanel) + panelID++ + yPos++ + + // Add panels in this row + xPos := 0 + for _, panel := range row.Panels { + // Check if panel wraps to next line + if xPos+panel.Width > 24 { + xPos = 0 + yPos += panel.Height + } + + p := panel.generate(panelID, xPos, yPos) + panels = append(panels, p) + panelID++ + + xPos += panel.Width + } + + // Move yPos to next row after last panel + if len(row.Panels) > 0 { + yPos += row.Panels[0].Height // Assume uniform height in row + } + } + + return panels +} diff --git a/deploy/charts/monitoring/dashboards/generator/dashboard_test.go b/deploy/charts/monitoring/dashboards/generator/dashboard_test.go new file mode 100644 index 0000000000000..96455452685e9 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/dashboard_test.go @@ -0,0 +1,99 @@ +package generator + +import ( + "encoding/json" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestDashboard_Generate_BasicStructure(t *testing.T) { + d := Dashboard{ + UID: "test-uid", + Title: "Test Dashboard", + Tags: []string{"stackrox", "test"}, + } + + result := d.Generate() + + assert.Equal(t, "test-uid", result["uid"]) + assert.Equal(t, "Test Dashboard", result["title"]) + assert.Equal(t, []string{"stackrox", "test"}, result["tags"]) + assert.Equal(t, true, result["editable"]) + assert.NotNil(t, result["time"]) + + // Validate it produces valid JSON + _, err := json.Marshal(result) + require.NoError(t, err) +} + +func TestDashboard_Generate_WithLinks(t *testing.T) { + d := Dashboard{ + UID: "overview", + Title: "Overview", + Links: []DashboardLink{ + {Title: "Central Internals", TargetUID: "central-internals", Type: "link"}, + }, + } + + result := d.Generate() + + links, ok := result["links"].([]map[string]any) + require.True(t, ok) + require.Len(t, links, 1) + assert.Equal(t, "Central Internals", links[0]["title"]) + assert.Contains(t, links[0]["url"], "central-internals") +} + +func TestDashboard_Generate_WithVariables(t *testing.T) { + d := Dashboard{ + UID: "test", + Title: "Test", + Templating: []Variable{ + {Name: "datasource", Type: "datasource", Label: "Data Source"}, + {Name: "cluster", Type: "query", Query: "label_values(cluster)", Label: "Cluster"}, + }, + } + + result := d.Generate() + + templating, ok := result["templating"].(map[string]any) + require.True(t, ok) + + list, ok := templating["list"].([]map[string]any) + require.True(t, ok) + require.Len(t, list, 2) + + assert.Equal(t, "datasource", list[0]["name"]) + assert.Equal(t, "cluster", list[1]["name"]) +} + +func TestDashboard_Generate_WithRows(t *testing.T) { + d := Dashboard{ + UID: "test", + Title: "Test", + Rows: []Row{ + { + Title: "Section 1", + Panels: []Panel{ + {Title: "Panel 1", Width: 12, Height: 8, Type: "timeseries"}, + }, + }, + }, + } + + result := d.Generate() + + panels, ok := result["panels"].([]map[string]any) + require.True(t, ok) + // Should have row header + 1 panel = 2 items + require.Len(t, panels, 2) + + // First item should be row + assert.Equal(t, "row", panels[0]["type"]) + assert.Equal(t, "Section 1", panels[0]["title"]) + + // Second item should be panel + assert.Equal(t, "Panel 1", panels[1]["title"]) +} diff --git a/deploy/charts/monitoring/dashboards/generator/generate/main.go b/deploy/charts/monitoring/dashboards/generator/generate/main.go new file mode 100644 index 0000000000000..00c3262fb448f --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/generate/main.go @@ -0,0 +1,65 @@ +// Package main provides a CLI tool to generate Grafana dashboard JSON files +// from Go dashboard definitions. +package main + +import ( + "encoding/json" + "fmt" + "log" + "os" + "path/filepath" + + "github.com/stackrox/rox/deploy/charts/monitoring/dashboards/generator" +) + +func writeJSON(dir string, filename string, data map[string]any) error { + b, err := json.MarshalIndent(data, "", " ") + if err != nil { + return fmt.Errorf("marshal %s: %w", filename, err) + } + outPath := filepath.Join(dir, filename) + if err := os.WriteFile(outPath, b, 0644); err != nil { + return fmt.Errorf("write %s: %w", outPath, err) + } + fmt.Printf("Generated: %s\n", outPath) + return nil +} + +func main() { + outDir := "deploy/charts/monitoring/dashboards" + if len(os.Args) > 1 { + outDir = os.Args[1] + } + fmt.Printf("Dashboard output directory: %s\n", outDir) + + // Level 1 + l1 := generator.L1Overview() + if err := writeJSON(outDir, "stackrox-overview.json", l1.Generate()); err != nil { + log.Fatal(err) + } + + // Level 2 + l2 := generator.L2CentralInternals() + if err := writeJSON(outDir, "central-internals.json", l2.Generate()); err != nil { + log.Fatal(err) + } + + // Level 3 + l3si := generator.L3SensorIngestion() + if err := writeJSON(outDir, "central-sensor-ingestion.json", l3si.Generate()); err != nil { + log.Fatal(err) + } + + l3ve := generator.L3VulnEnrichment() + if err := writeJSON(outDir, "central-vuln-enrichment.json", l3ve.Generate()); err != nil { + log.Fatal(err) + } + + // Level 3 Stubs (8 remaining dashboards) + for _, stub := range generator.L3Stubs() { + filename := stub.UID + ".json" + if err := writeJSON(outDir, filename, stub.Generate()); err != nil { + log.Fatal(err) + } + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/l1_overview.go b/deploy/charts/monitoring/dashboards/generator/l1_overview.go new file mode 100644 index 0000000000000..f82b2605a28d7 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l1_overview.go @@ -0,0 +1,208 @@ +package generator + +// L1Overview creates the Level 1 "StackRox Overview" dashboard. +// This is the top-level service map dashboard that provides a high-level view +// of the entire StackRox deployment including Central health, connected sensors, +// and database status. +func L1Overview() Dashboard { + return Dashboard{ + UID: "stackrox-overview", + Title: "StackRox Overview", + Tags: []string{"stackrox", "overview", "level-1"}, + Links: []DashboardLink{ + { + Title: "Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + serviceHealthRow(), + connectedSensorsRow(), + databaseRow(), + }, + } +} + +func serviceHealthRow() Row { + return Row{ + Title: "Service Health", + Panels: []Panel{ + { + Title: "Central Up", + Type: "stat", + Width: 4, + Height: 8, + Queries: []Query{ + { + Expr: `up{job="central"}`, + RefID: "A", + }, + }, + }, + { + Title: "Central CPU", + Type: "timeseries", + Width: 5, + Height: 8, + Queries: []Query{ + { + Expr: `rate(process_cpu_seconds_total{job="central"}[5m])`, + RefID: "A", + }, + }, + }, + { + Title: "Central Memory", + Type: "timeseries", + Width: 5, + Height: 8, + Queries: []Query{ + { + Expr: `process_resident_memory_bytes{job="central"}`, + RefID: "A", + }, + }, + }, + { + Title: "Central Goroutines", + Type: "stat", + Width: 5, + Height: 8, + Queries: []Query{ + { + Expr: `go_goroutines{job="central"}`, + RefID: "A", + }, + }, + }, + { + Title: "Central Version", + Type: "stat", + Width: 5, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_info`, + LegendFormat: `{{central_version}}`, + RefID: "A", + }, + }, + }, + }, + } +} + +func connectedSensorsRow() Row { + return Row{ + Title: "Connected Sensors", + Panels: []Panel{ + { + Title: "Sensors Connected", + Type: "stat", + Width: 6, + Height: 8, + Queries: []Query{ + { + Expr: `count by (connection_state) (rox_central_sensor_connected)`, + RefID: "A", + }, + }, + }, + { + Title: "Secured Clusters", + Type: "stat", + Width: 6, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_secured_clusters`, + RefID: "A", + }, + }, + }, + { + Title: "Secured Nodes", + Type: "stat", + Width: 6, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_secured_nodes`, + RefID: "A", + }, + }, + }, + { + Title: "Secured vCPUs", + Type: "stat", + Width: 6, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_secured_vcpus`, + RefID: "A", + }, + }, + }, + }, + } +} + +func databaseRow() Row { + return Row{ + Title: "Database", + Panels: []Panel{ + { + Title: "Postgres Connected", + Type: "stat", + Width: 6, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_postgres_connected`, + RefID: "A", + }, + }, + }, + { + Title: "DB Size", + Type: "stat", + Width: 6, + Height: 8, + Unit: "bytes", + Queries: []Query{ + { + Expr: `rox_central_postgres_total_size_bytes`, + RefID: "A", + }, + }, + }, + { + Title: "Active Connections", + Type: "stat", + Width: 6, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_postgres_total_connections{state="active"}`, + RefID: "A", + }, + }, + }, + { + Title: "Available Space", + Type: "stat", + Width: 6, + Height: 8, + Unit: "bytes", + Queries: []Query{ + { + Expr: `rox_central_postgres_available_size_bytes`, + RefID: "A", + }, + }, + }, + }, + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/l1_overview_test.go b/deploy/charts/monitoring/dashboards/generator/l1_overview_test.go new file mode 100644 index 0000000000000..499407d760441 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l1_overview_test.go @@ -0,0 +1,279 @@ +package generator + +import ( + "encoding/json" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestL1Overview_BasicMetadata(t *testing.T) { + d := L1Overview() + + assert.Equal(t, "stackrox-overview", d.UID) + assert.Equal(t, "StackRox Overview", d.Title) + assert.Contains(t, d.Tags, "level-1") + assert.Contains(t, d.Tags, "stackrox") + assert.Contains(t, d.Tags, "overview") +} + +func TestL1Overview_HasLinkToCentralInternals(t *testing.T) { + d := L1Overview() + + require.Len(t, d.Links, 1) + assert.Equal(t, "Central Internals", d.Links[0].Title) + assert.Equal(t, "central-internals", d.Links[0].TargetUID) +} + +func TestL1Overview_HasRequiredRows(t *testing.T) { + d := L1Overview() + + require.GreaterOrEqual(t, len(d.Rows), 3, "Should have at least 3 rows") + + // Verify row titles + rowTitles := make([]string, len(d.Rows)) + for i, row := range d.Rows { + rowTitles[i] = row.Title + } + + assert.Contains(t, rowTitles, "Service Health") + assert.Contains(t, rowTitles, "Connected Sensors") + assert.Contains(t, rowTitles, "Database") +} + +func TestL1Overview_ServiceHealthRow(t *testing.T) { + d := L1Overview() + + var serviceHealthRow *Row + for i := range d.Rows { + if d.Rows[i].Title == "Service Health" { + serviceHealthRow = &d.Rows[i] + break + } + } + + require.NotNil(t, serviceHealthRow, "Service Health row should exist") + require.Len(t, serviceHealthRow.Panels, 5, "Service Health should have 5 panels") + + // Verify panel titles + panelTitles := make([]string, len(serviceHealthRow.Panels)) + for i, panel := range serviceHealthRow.Panels { + panelTitles[i] = panel.Title + } + + assert.Contains(t, panelTitles, "Central Up") + assert.Contains(t, panelTitles, "Central CPU") + assert.Contains(t, panelTitles, "Central Memory") + assert.Contains(t, panelTitles, "Central Goroutines") + assert.Contains(t, panelTitles, "Central Version") + + // Verify Central Up panel + var centralUpPanel *Panel + for i := range serviceHealthRow.Panels { + if serviceHealthRow.Panels[i].Title == "Central Up" { + centralUpPanel = &serviceHealthRow.Panels[i] + break + } + } + require.NotNil(t, centralUpPanel) + assert.Equal(t, "stat", centralUpPanel.Type) + assert.Equal(t, 4, centralUpPanel.Width) + require.Len(t, centralUpPanel.Queries, 1) + assert.Equal(t, `up{job="central"}`, centralUpPanel.Queries[0].Expr) + + // Verify Central CPU panel + var centralCPUPanel *Panel + for i := range serviceHealthRow.Panels { + if serviceHealthRow.Panels[i].Title == "Central CPU" { + centralCPUPanel = &serviceHealthRow.Panels[i] + break + } + } + require.NotNil(t, centralCPUPanel) + assert.Equal(t, "timeseries", centralCPUPanel.Type) + assert.Equal(t, 5, centralCPUPanel.Width) + require.Len(t, centralCPUPanel.Queries, 1) + assert.Equal(t, `rate(process_cpu_seconds_total{job="central"}[5m])`, centralCPUPanel.Queries[0].Expr) + + // Verify Central Memory panel + var centralMemoryPanel *Panel + for i := range serviceHealthRow.Panels { + if serviceHealthRow.Panels[i].Title == "Central Memory" { + centralMemoryPanel = &serviceHealthRow.Panels[i] + break + } + } + require.NotNil(t, centralMemoryPanel) + assert.Equal(t, "timeseries", centralMemoryPanel.Type) + assert.Equal(t, 5, centralMemoryPanel.Width) + require.Len(t, centralMemoryPanel.Queries, 1) + assert.Equal(t, `process_resident_memory_bytes{job="central"}`, centralMemoryPanel.Queries[0].Expr) + + // Verify Central Goroutines panel + var centralGoroutinesPanel *Panel + for i := range serviceHealthRow.Panels { + if serviceHealthRow.Panels[i].Title == "Central Goroutines" { + centralGoroutinesPanel = &serviceHealthRow.Panels[i] + break + } + } + require.NotNil(t, centralGoroutinesPanel) + assert.Equal(t, "stat", centralGoroutinesPanel.Type) + assert.Equal(t, 5, centralGoroutinesPanel.Width) + require.Len(t, centralGoroutinesPanel.Queries, 1) + assert.Equal(t, `go_goroutines{job="central"}`, centralGoroutinesPanel.Queries[0].Expr) + + // Verify Central Version panel + var centralVersionPanel *Panel + for i := range serviceHealthRow.Panels { + if serviceHealthRow.Panels[i].Title == "Central Version" { + centralVersionPanel = &serviceHealthRow.Panels[i] + break + } + } + require.NotNil(t, centralVersionPanel) + assert.Equal(t, "stat", centralVersionPanel.Type) + assert.Equal(t, 5, centralVersionPanel.Width) + require.Len(t, centralVersionPanel.Queries, 1) + assert.Equal(t, `rox_central_info`, centralVersionPanel.Queries[0].Expr) + assert.Equal(t, `{{central_version}}`, centralVersionPanel.Queries[0].LegendFormat) +} + +func TestL1Overview_ConnectedSensorsRow(t *testing.T) { + d := L1Overview() + + var connectedSensorsRow *Row + for i := range d.Rows { + if d.Rows[i].Title == "Connected Sensors" { + connectedSensorsRow = &d.Rows[i] + break + } + } + + require.NotNil(t, connectedSensorsRow, "Connected Sensors row should exist") + require.Len(t, connectedSensorsRow.Panels, 4, "Connected Sensors should have 4 panels") + + // Verify panel titles + panelTitles := make([]string, len(connectedSensorsRow.Panels)) + for i, panel := range connectedSensorsRow.Panels { + panelTitles[i] = panel.Title + } + + assert.Contains(t, panelTitles, "Sensors Connected") + assert.Contains(t, panelTitles, "Secured Clusters") + assert.Contains(t, panelTitles, "Secured Nodes") + assert.Contains(t, panelTitles, "Secured vCPUs") + + // Verify Sensors Connected panel + var sensorsConnectedPanel *Panel + for i := range connectedSensorsRow.Panels { + if connectedSensorsRow.Panels[i].Title == "Sensors Connected" { + sensorsConnectedPanel = &connectedSensorsRow.Panels[i] + break + } + } + require.NotNil(t, sensorsConnectedPanel) + assert.Equal(t, "stat", sensorsConnectedPanel.Type) + assert.Equal(t, 6, sensorsConnectedPanel.Width) + require.Len(t, sensorsConnectedPanel.Queries, 1) + assert.Equal(t, `count by (connection_state) (rox_central_sensor_connected)`, sensorsConnectedPanel.Queries[0].Expr) +} + +func TestL1Overview_DatabaseRow(t *testing.T) { + d := L1Overview() + + var databaseRow *Row + for i := range d.Rows { + if d.Rows[i].Title == "Database" { + databaseRow = &d.Rows[i] + break + } + } + + require.NotNil(t, databaseRow, "Database row should exist") + require.Len(t, databaseRow.Panels, 4, "Database should have 4 panels") + + // Verify panel titles + panelTitles := make([]string, len(databaseRow.Panels)) + for i, panel := range databaseRow.Panels { + panelTitles[i] = panel.Title + } + + assert.Contains(t, panelTitles, "Postgres Connected") + assert.Contains(t, panelTitles, "DB Size") + assert.Contains(t, panelTitles, "Active Connections") + assert.Contains(t, panelTitles, "Available Space") + + // Verify Postgres Connected panel + var postgresConnectedPanel *Panel + for i := range databaseRow.Panels { + if databaseRow.Panels[i].Title == "Postgres Connected" { + postgresConnectedPanel = &databaseRow.Panels[i] + break + } + } + require.NotNil(t, postgresConnectedPanel) + assert.Equal(t, "stat", postgresConnectedPanel.Type) + assert.Equal(t, 6, postgresConnectedPanel.Width) + require.Len(t, postgresConnectedPanel.Queries, 1) + assert.Equal(t, `rox_central_postgres_connected`, postgresConnectedPanel.Queries[0].Expr) + + // Verify DB Size panel has bytes unit + var dbSizePanel *Panel + for i := range databaseRow.Panels { + if databaseRow.Panels[i].Title == "DB Size" { + dbSizePanel = &databaseRow.Panels[i] + break + } + } + require.NotNil(t, dbSizePanel) + assert.Equal(t, "bytes", dbSizePanel.Unit) + assert.Equal(t, `rox_central_postgres_total_size_bytes`, dbSizePanel.Queries[0].Expr) + + // Verify Available Space panel has bytes unit + var availableSpacePanel *Panel + for i := range databaseRow.Panels { + if databaseRow.Panels[i].Title == "Available Space" { + availableSpacePanel = &databaseRow.Panels[i] + break + } + } + require.NotNil(t, availableSpacePanel) + assert.Equal(t, "bytes", availableSpacePanel.Unit) + assert.Equal(t, `rox_central_postgres_available_size_bytes`, availableSpacePanel.Queries[0].Expr) +} + +func TestL1Overview_ProducesValidJSON(t *testing.T) { + d := L1Overview() + result := d.Generate() + + // Should marshal to valid JSON + b, err := json.Marshal(result) + require.NoError(t, err) + require.NotEmpty(t, b) + + // Should unmarshal back + var unmarshaled map[string]any + err = json.Unmarshal(b, &unmarshaled) + require.NoError(t, err) + + // Verify key fields survived round-trip + assert.Equal(t, "stackrox-overview", unmarshaled["uid"]) + assert.Equal(t, "StackRox Overview", unmarshaled["title"]) +} + +func TestL1Overview_AllPanelsHaveValidWidth(t *testing.T) { + d := L1Overview() + + for _, row := range d.Rows { + rowWidth := 0 + for _, panel := range row.Panels { + assert.Greater(t, panel.Width, 0, "Panel %s should have positive width", panel.Title) + assert.LessOrEqual(t, panel.Width, 24, "Panel %s width should not exceed 24", panel.Title) + rowWidth += panel.Width + } + // Each row should have reasonable total width (allowing wrapping) + assert.Greater(t, rowWidth, 0, "Row %s should have panels with total width > 0", row.Title) + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/l2_central.go b/deploy/charts/monitoring/dashboards/generator/l2_central.go new file mode 100644 index 0000000000000..9e3b2a0138dcd --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l2_central.go @@ -0,0 +1,450 @@ +package generator + +// L2CentralInternals creates the Level 2 "Central Internals" dashboard. +// This dashboard provides a grid view of all 10 logical regions within Central, +// showing headline metrics for each region with links to detailed Level 3 dashboards. +func L2CentralInternals() Dashboard { + return Dashboard{ + UID: "central-internals", + Title: "Central Internals", + Tags: []string{"stackrox", "central", "level-2"}, + Links: []DashboardLink{ + { + Title: "← StackRox Overview", + TargetUID: "stackrox-overview", + Type: "link", + }, + }, + Rows: []Row{ + sensorIngestionRow(), + deploymentProcessingRow(), + vulnerabilityEnrichmentRow(), + detectionAlertsRow(), + riskCalculationRow(), + backgroundReprocessingRow(), + pruningGCRow(), + networkAnalysisRow(), + reportGenerationRow(), + apiUIRow(), + }, + } +} + +func sensorIngestionRow() Row { + return Row{ + Title: "Sensor Ingestion", + Panels: []Panel{ + { + Title: "events/sec", + Type: "timeseries", + Width: 7, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_sensor_event_queue[5m])`, + RefID: "A", + }, + }, + }, + { + Title: "deduper", + Type: "timeseries", + Width: 7, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_sensor_event_deduper[5m])`, + RefID: "A", + }, + }, + }, + { + Title: "processing latency p95", + Type: "timeseries", + Width: 7, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_sensor_event_duration_bucket[5m]))`, + RefID: "A", + }, + }, + }, + { + Title: "", + Width: 3, + Height: 8, + GapNote: "### [→ Details](/d/central-sensor-ingestion)\n\nDrill into Sensor Ingestion metrics", + }, + }, + } +} + +func deploymentProcessingRow() Row { + return Row{ + Title: "Deployment Processing", + Panels: []Panel{ + { + Title: "resources/sec", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_resource_processed_count[5m])`, + RefID: "A", + }, + }, + }, + { + Title: "K8s event latency", + Type: "timeseries", + Width: 8, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_k8s_event_processing_duration_bucket[5m]))`, + RefID: "A", + }, + }, + }, + { + Title: "", + Width: 8, + Height: 8, + GapNote: "### [→ Details](/d/central-deployment-processing)\n\nDrill into Deployment Processing metrics", + }, + }, + } +} + +func vulnerabilityEnrichmentRow() Row { + return Row{ + Title: "Vulnerability Enrichment", + Panels: []Panel{ + { + Title: "scans in-flight", + Type: "stat", + Width: 6, + Height: 8, + Queries: []Query{ + { + Expr: `rox_image_scan_semaphore_holding_size`, + RefID: "A", + }, + }, + }, + { + Title: "scan duration p95", + Type: "timeseries", + Width: 7, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_scan_duration_bucket[5m]))`, + RefID: "A", + }, + }, + }, + { + Title: "queue waiting", + Type: "timeseries", + Width: 7, + Height: 8, + Queries: []Query{ + { + Expr: `rox_image_scan_semaphore_queue_size`, + RefID: "A", + }, + }, + }, + { + Title: "", + Width: 4, + Height: 8, + GapNote: "### [→ Details](/d/central-vuln-enrichment)\n\nDrill into Vulnerability Enrichment metrics", + }, + }, + } +} + +func detectionAlertsRow() Row { + return Row{ + Title: "Detection & Alerts", + Panels: []Panel{ + { + Title: "process filter", + Type: "timeseries", + Width: 10, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_process_filter`, + RefID: "A", + }, + }, + }, + { + Title: "Alert Generation Rate", + Width: 10, + Height: 8, + GapNote: "⚠️ No alert generation rate metric available", + }, + { + Title: "", + Width: 4, + Height: 8, + GapNote: "### [→ Details](/d/central-detection-alerts)\n\nDrill into Detection & Alerts metrics", + }, + }, + } +} + +func riskCalculationRow() Row { + return Row{ + Title: "Risk Calculation", + Panels: []Panel{ + { + Title: "risk duration", + Type: "timeseries", + Width: 10, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `rox_central_risk_processing_duration`, + RefID: "A", + }, + }, + }, + { + Title: "reprocessor", + Type: "timeseries", + Width: 10, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `rox_central_reprocessor_duration_seconds`, + RefID: "A", + }, + }, + }, + { + Title: "", + Width: 4, + Height: 8, + GapNote: "### [→ Details](/d/central-risk-calculation)\n\nDrill into Risk Calculation metrics", + }, + }, + } +} + +func backgroundReprocessingRow() Row { + return Row{ + Title: "Background Reprocessing", + Panels: []Panel{ + { + Title: "reprocessor duration", + Type: "timeseries", + Width: 7, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `rox_central_reprocessor_duration_seconds`, + RefID: "A", + }, + }, + }, + { + Title: "sig verification", + Type: "timeseries", + Width: 7, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `rox_central_signature_verification_reprocessor_duration_seconds`, + RefID: "A", + }, + }, + }, + { + Title: "Running/Items Processed", + Width: 7, + Height: 8, + GapNote: "⚠️ No running/items-processed metrics available", + }, + { + Title: "", + Width: 3, + Height: 8, + GapNote: "### [→ Details](/d/central-background-reprocessing)\n\nDrill into Background Reprocessing metrics", + }, + }, + } +} + +func pruningGCRow() Row { + return Row{ + Title: "Pruning & GC", + Panels: []Panel{ + { + Title: "prune duration", + Type: "timeseries", + Width: 7, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `rox_central_prune_duration`, + RefID: "A", + }, + }, + }, + { + Title: "process queue", + Type: "timeseries", + Width: 7, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_process_queue_length`, + RefID: "A", + }, + }, + }, + { + Title: "pruned indicators", + Type: "timeseries", + Width: 7, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_pruned_process_indicators[5m])`, + RefID: "A", + }, + }, + }, + { + Title: "", + Width: 3, + Height: 8, + GapNote: "### [→ Details](/d/central-pruning-gc)\n\nDrill into Pruning & GC metrics", + }, + }, + } +} + +func networkAnalysisRow() Row { + return Row{ + Title: "Network Analysis", + Panels: []Panel{ + { + Title: "flows received", + Type: "timeseries", + Width: 10, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_total_network_flows_central_received_counter[5m])`, + RefID: "A", + }, + }, + }, + { + Title: "endpoints received", + Type: "timeseries", + Width: 10, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_total_network_endpoints_received_counter[5m])`, + RefID: "A", + }, + }, + }, + { + Title: "", + Width: 4, + Height: 8, + GapNote: "### [→ Details](/d/central-network-analysis)\n\nDrill into Network Analysis metrics", + }, + }, + } +} + +func reportGenerationRow() Row { + return Row{ + Title: "Report Generation", + Panels: []Panel{ + { + Title: "Central-side Reports", + Width: 10, + Height: 8, + GapNote: "⚠️ No Central-side report generation metrics exist", + }, + { + Title: "compliance watchers", + Type: "timeseries", + Width: 10, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_complianceoperator_scan_watchers_current`, + RefID: "A", + }, + }, + }, + { + Title: "", + Width: 4, + Height: 8, + GapNote: "### [→ Details](/d/central-report-generation)\n\nDrill into Report Generation metrics", + }, + }, + } +} + +func apiUIRow() Row { + return Row{ + Title: "API & UI", + Panels: []Panel{ + { + Title: "GraphQL p95", + Type: "timeseries", + Width: 10, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_graphql_query_duration_bucket[5m]))`, + RefID: "A", + }, + }, + }, + { + Title: "gRPC errors", + Type: "timeseries", + Width: 10, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_grpc_error[5m])`, + RefID: "A", + }, + }, + }, + { + Title: "", + Width: 4, + Height: 8, + GapNote: "### [→ Details](/d/central-api-ui)\n\nDrill into API & UI metrics", + }, + }, + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/l2_central_test.go b/deploy/charts/monitoring/dashboards/generator/l2_central_test.go new file mode 100644 index 0000000000000..8557a4266cdd7 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l2_central_test.go @@ -0,0 +1,347 @@ +package generator + +import ( + "encoding/json" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestL2CentralInternals_BasicMetadata(t *testing.T) { + d := L2CentralInternals() + + assert.Equal(t, "central-internals", d.UID) + assert.Equal(t, "Central Internals", d.Title) + assert.Contains(t, d.Tags, "level-2") + assert.Contains(t, d.Tags, "stackrox") + assert.Contains(t, d.Tags, "central") +} + +func TestL2CentralInternals_HasBackLinkToOverview(t *testing.T) { + d := L2CentralInternals() + + require.Len(t, d.Links, 1) + assert.Equal(t, "← StackRox Overview", d.Links[0].Title) + assert.Equal(t, "stackrox-overview", d.Links[0].TargetUID) +} + +func TestL2CentralInternals_HasTenRows(t *testing.T) { + d := L2CentralInternals() + + require.Len(t, d.Rows, 10, "Should have exactly 10 rows (one per logical region)") + + expectedRows := []string{ + "Sensor Ingestion", + "Deployment Processing", + "Vulnerability Enrichment", + "Detection & Alerts", + "Risk Calculation", + "Background Reprocessing", + "Pruning & GC", + "Network Analysis", + "Report Generation", + "API & UI", + } + + rowTitles := make([]string, len(d.Rows)) + for i, row := range d.Rows { + rowTitles[i] = row.Title + } + + assert.Equal(t, expectedRows, rowTitles) +} + +func TestL2CentralInternals_EachRowHasPanels(t *testing.T) { + d := L2CentralInternals() + + for _, row := range d.Rows { + assert.Greater(t, len(row.Panels), 0, "Row %s should have at least one panel", row.Title) + } +} + +func TestL2CentralInternals_SensorIngestionRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Sensor Ingestion" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Sensor Ingestion row should exist") + require.Len(t, row.Panels, 4, "Should have 3 metric panels + 1 details link") + + // Verify panel titles + assert.Equal(t, "events/sec", row.Panels[0].Title) + assert.Equal(t, "deduper", row.Panels[1].Title) + assert.Equal(t, "processing latency p95", row.Panels[2].Title) + + // Verify details link panel + detailsPanel := row.Panels[3] + assert.NotEmpty(t, detailsPanel.GapNote, "Details link should use GapNote for markdown") + assert.Contains(t, detailsPanel.GapNote, "central-sensor-ingestion") + assert.Contains(t, detailsPanel.GapNote, "Details") +} + +func TestL2CentralInternals_DeploymentProcessingRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Deployment Processing" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Deployment Processing row should exist") + + // Verify metric panels exist + assert.Equal(t, "resources/sec", row.Panels[0].Title) + assert.Equal(t, "K8s event latency", row.Panels[1].Title) + + // Verify queries are correct + assert.Contains(t, row.Panels[0].Queries[0].Expr, "rox_central_resource_processed_count") + assert.Contains(t, row.Panels[1].Queries[0].Expr, "rox_central_k8s_event_processing_duration_bucket") +} + +func TestL2CentralInternals_VulnerabilityEnrichmentRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Vulnerability Enrichment" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Vulnerability Enrichment row should exist") + require.Len(t, row.Panels, 4, "Should have 3 metric panels + 1 details link") + + // Verify panel titles + assert.Equal(t, "scans in-flight", row.Panels[0].Title) + assert.Equal(t, "scan duration p95", row.Panels[1].Title) + assert.Equal(t, "queue waiting", row.Panels[2].Title) + + // Verify panel types + assert.Equal(t, "stat", row.Panels[0].Type) // scans in-flight is a stat + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, "timeseries", row.Panels[2].Type) +} + +func TestL2CentralInternals_DetectionAlertsRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Detection & Alerts" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Detection & Alerts row should exist") + + // Should have process filter panel + assert.Equal(t, "process filter", row.Panels[0].Title) + assert.Contains(t, row.Panels[0].Queries[0].Expr, "rox_central_process_filter") + + // Should have gap panel for alert generation rate + var gapPanel *Panel + for i := range row.Panels { + if row.Panels[i].GapNote != "" && !isDetailsLink(row.Panels[i].GapNote) { + gapPanel = &row.Panels[i] + break + } + } + require.NotNil(t, gapPanel, "Should have gap panel for missing alert generation rate metric") + assert.Contains(t, gapPanel.GapNote, "alert generation rate") +} + +func TestL2CentralInternals_RiskCalculationRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Risk Calculation" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Risk Calculation row should exist") + + // Verify panels + assert.Equal(t, "risk duration", row.Panels[0].Title) + assert.Equal(t, "reprocessor", row.Panels[1].Title) + + // Verify queries + assert.Contains(t, row.Panels[0].Queries[0].Expr, "rox_central_risk_processing_duration") + assert.Contains(t, row.Panels[1].Queries[0].Expr, "rox_central_reprocessor_duration_seconds") +} + +func TestL2CentralInternals_BackgroundReprocessingRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Background Reprocessing" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Background Reprocessing row should exist") + + // Should have gap panel for missing metrics + var gapPanel *Panel + for i := range row.Panels { + if row.Panels[i].GapNote != "" && !isDetailsLink(row.Panels[i].GapNote) { + gapPanel = &row.Panels[i] + break + } + } + require.NotNil(t, gapPanel, "Should have gap panel for missing running/items-processed metrics") + assert.Contains(t, gapPanel.GapNote, "running/items-processed") +} + +func TestL2CentralInternals_PruningGCRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Pruning & GC" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Pruning & GC row should exist") + require.Len(t, row.Panels, 4, "Should have 3 metric panels + 1 details link") + + // Verify panel titles + assert.Equal(t, "prune duration", row.Panels[0].Title) + assert.Equal(t, "process queue", row.Panels[1].Title) + assert.Equal(t, "pruned indicators", row.Panels[2].Title) +} + +func TestL2CentralInternals_NetworkAnalysisRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Network Analysis" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Network Analysis row should exist") + + // Verify panels + assert.Equal(t, "flows received", row.Panels[0].Title) + assert.Equal(t, "endpoints received", row.Panels[1].Title) + + // Verify queries + assert.Contains(t, row.Panels[0].Queries[0].Expr, "rox_central_total_network_flows_central_received_counter") + assert.Contains(t, row.Panels[1].Queries[0].Expr, "rox_central_total_network_endpoints_received_counter") +} + +func TestL2CentralInternals_ReportGenerationRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Report Generation" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "Report Generation row should exist") + + // Should have gap panel for missing Central-side report generation metrics + var gapPanel *Panel + for i := range row.Panels { + if row.Panels[i].GapNote != "" && !isDetailsLink(row.Panels[i].GapNote) { + gapPanel = &row.Panels[i] + break + } + } + require.NotNil(t, gapPanel, "Should have gap panel for missing report generation metrics") + assert.Contains(t, gapPanel.GapNote, "Central-side report generation") + + // Should have compliance watchers panel + var compliancePanel *Panel + for i := range row.Panels { + if row.Panels[i].Title == "compliance watchers" { + compliancePanel = &row.Panels[i] + break + } + } + require.NotNil(t, compliancePanel) + assert.Contains(t, compliancePanel.Queries[0].Expr, "rox_central_complianceoperator_scan_watchers_current") +} + +func TestL2CentralInternals_APIUIRow(t *testing.T) { + d := L2CentralInternals() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "API & UI" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row, "API & UI row should exist") + + // Verify panels + assert.Equal(t, "GraphQL p95", row.Panels[0].Title) + assert.Equal(t, "gRPC errors", row.Panels[1].Title) + + // Verify queries + assert.Contains(t, row.Panels[0].Queries[0].Expr, "rox_central_graphql_query_duration_bucket") + assert.Contains(t, row.Panels[1].Queries[0].Expr, "rox_central_grpc_error") +} + +func TestL2CentralInternals_ProducesValidJSON(t *testing.T) { + d := L2CentralInternals() + result := d.Generate() + + // Should marshal to valid JSON + b, err := json.Marshal(result) + require.NoError(t, err) + require.NotEmpty(t, b) + + // Should unmarshal back + var unmarshaled map[string]any + err = json.Unmarshal(b, &unmarshaled) + require.NoError(t, err) + + // Verify key fields survived round-trip + assert.Equal(t, "central-internals", unmarshaled["uid"]) + assert.Equal(t, "Central Internals", unmarshaled["title"]) +} + +func TestL2CentralInternals_AllPanelsHaveValidWidth(t *testing.T) { + d := L2CentralInternals() + + for _, row := range d.Rows { + for _, panel := range row.Panels { + assert.Greater(t, panel.Width, 0, "Panel %s should have positive width", panel.Title) + assert.LessOrEqual(t, panel.Width, 24, "Panel %s width should not exceed 24", panel.Title) + } + } +} + +// isDetailsLink checks if a GapNote contains a details link to L3 dashboard +func isDetailsLink(note string) bool { + return len(note) > 0 && ( + (note[0] == '[' && len(note) > 1) || // Starts with markdown link + (len(note) >= 3 && note[0:3] == "###")) // Starts with markdown header +} diff --git a/deploy/charts/monitoring/dashboards/generator/l3_sensor_ingestion.go b/deploy/charts/monitoring/dashboards/generator/l3_sensor_ingestion.go new file mode 100644 index 0000000000000..2093ad3e4b460 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l3_sensor_ingestion.go @@ -0,0 +1,239 @@ +package generator + +// L3SensorIngestion creates the Level 3 "Central: Sensor Ingestion" detail dashboard. +// This dashboard provides deep visibility into the Sensor Ingestion pipeline with full metric +// breakdowns for connection status, deduplication, worker queues, and pipeline processing. +func L3SensorIngestion() Dashboard { + return Dashboard{ + UID: "central-sensor-ingestion", + Title: "Central: Sensor Ingestion", + Tags: []string{"stackrox", "central", "level-3", "sensor-ingestion"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + connectionStatusRow(), + deduperRow(), + workerQueueRow(), + pipelineProcessingRow(), + messagesNotSentRow(), + }, + } +} + +func connectionStatusRow() Row { + return Row{ + Title: "Connection Status", + Panels: []Panel{ + { + Title: "Sensors Connected", + Type: "stat", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `count(rox_central_sensor_connected{connection_state="connected"})`, + RefID: "A", + }, + }, + }, + { + Title: "Connection Events", + Type: "timeseries", + Width: 16, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_sensor_connected[5m])`, + LegendFormat: `{{connection_state}}`, + RefID: "A", + }, + }, + }, + }, + } +} + +func deduperRow() Row { + return Row{ + Title: "Deduper", + Panels: []Panel{ + { + Title: "Deduper Throughput", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_sensor_event_deduper[5m])`, + LegendFormat: `{{status}} - {{type}}`, + RefID: "A", + }, + }, + }, + { + Title: "Deduper Hit Rate", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_sensor_event_deduper{status="deduplicated"}[5m]) / rate(rox_central_sensor_event_deduper[5m])`, + LegendFormat: `dedup rate`, + RefID: "A", + }, + }, + }, + { + Title: "Hash Store Size", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_deduping_hash_size`, + LegendFormat: `{{cluster}}`, + RefID: "A", + }, + }, + }, + { + Title: "Hash Operations", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_deduping_hash_count[5m])`, + LegendFormat: `{{ResourceType}} - {{Operation}}`, + RefID: "A", + }, + }, + }, + }, + } +} + +func workerQueueRow() Row { + return Row{ + Title: "Worker Queue", + Panels: []Panel{ + { + Title: "Events Processed", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_sensor_event_queue[5m])`, + LegendFormat: `{{Operation}} - {{Type}}`, + RefID: "A", + }, + }, + }, + { + Title: "Processing Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_sensor_event_duration_bucket[5m]))`, + LegendFormat: `p95 - {{Type}}`, + RefID: "A", + }, + }, + }, + { + Title: "GAP: Queue Depth", + Width: 12, + Height: 4, + GapNote: `**Metric Needed**: ` + "`central_sensor_ingestion_queue_depth`" + ` — No gauge exists for worker queue shard depth. Cannot answer "is the queue backing up?"`, + }, + { + Title: "GAP: In-Flight", + Width: 12, + Height: 4, + GapNote: `**Metric Needed**: ` + "`central_sensor_ingestion_in_flight`" + ` — No gauge for items currently being processed per shard.`, + }, + }, + } +} + +func pipelineProcessingRow() Row { + return Row{ + Title: "Pipeline Processing", + Panels: []Panel{ + { + Title: "Resources Processed", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_resource_processed_count[5m])`, + LegendFormat: `{{Resource}} - {{Operation}}`, + RefID: "A", + }, + }, + }, + { + Title: "Pipeline Panics", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_pipeline_panics[5m])`, + LegendFormat: `{{resource}}`, + RefID: "A", + }, + }, + }, + { + Title: "K8s Event Processing", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_k8s_event_processing_duration_bucket[5m]))`, + LegendFormat: `p95 - {{Resource}}`, + RefID: "A", + }, + }, + }, + { + Title: "GAP: Per-Fragment Metrics", + Width: 12, + Height: 4, + GapNote: `**Metric Needed**: Per-fragment processing counts and durations. 25 pipeline fragments exist but none have individual metrics.`, + }, + }, + } +} + +func messagesNotSentRow() Row { + return Row{ + Title: "Messages Not Sent", + Panels: []Panel{ + { + Title: "Failed Sends to Sensor", + Type: "timeseries", + Width: 24, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_msg_to_sensor_not_sent_count[5m])`, + LegendFormat: `{{type}} - {{reason}}`, + RefID: "A", + }, + }, + }, + }, + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/l3_sensor_ingestion_test.go b/deploy/charts/monitoring/dashboards/generator/l3_sensor_ingestion_test.go new file mode 100644 index 0000000000000..17d07bfdfd269 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l3_sensor_ingestion_test.go @@ -0,0 +1,255 @@ +package generator + +import ( + "encoding/json" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestL3SensorIngestion_BasicMetadata(t *testing.T) { + d := L3SensorIngestion() + + assert.Equal(t, "central-sensor-ingestion", d.UID) + assert.Equal(t, "Central: Sensor Ingestion", d.Title) + assert.Contains(t, d.Tags, "level-3") + assert.Contains(t, d.Tags, "sensor-ingestion") + assert.Contains(t, d.Tags, "stackrox") + assert.Contains(t, d.Tags, "central") +} + +func TestL3SensorIngestion_HasBackLinkToCentralInternals(t *testing.T) { + d := L3SensorIngestion() + + require.Len(t, d.Links, 1) + assert.Equal(t, "← Central Internals", d.Links[0].Title) + assert.Equal(t, "central-internals", d.Links[0].TargetUID) +} + +func TestL3SensorIngestion_HasRequiredRows(t *testing.T) { + d := L3SensorIngestion() + + require.Equal(t, 5, len(d.Rows), "Should have exactly 5 rows") + + // Verify row titles + rowTitles := make([]string, len(d.Rows)) + for i, row := range d.Rows { + rowTitles[i] = row.Title + } + + assert.Contains(t, rowTitles, "Connection Status") + assert.Contains(t, rowTitles, "Deduper") + assert.Contains(t, rowTitles, "Worker Queue") + assert.Contains(t, rowTitles, "Pipeline Processing") + assert.Contains(t, rowTitles, "Messages Not Sent") +} + +func TestL3SensorIngestion_ConnectionStatusRow(t *testing.T) { + d := L3SensorIngestion() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Connection Status" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 2) + + // Verify Sensors Connected panel + assert.Equal(t, "Sensors Connected", row.Panels[0].Title) + assert.Equal(t, "stat", row.Panels[0].Type) + assert.Equal(t, 8, row.Panels[0].Width) + assert.Equal(t, `count(rox_central_sensor_connected{connection_state="connected"})`, row.Panels[0].Queries[0].Expr) + + // Verify Connection Events panel + assert.Equal(t, "Connection Events", row.Panels[1].Title) + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, 16, row.Panels[1].Width) + assert.Equal(t, `rate(rox_central_sensor_connected[5m])`, row.Panels[1].Queries[0].Expr) + assert.Equal(t, `{{connection_state}}`, row.Panels[1].Queries[0].LegendFormat) +} + +func TestL3SensorIngestion_DeduperRow(t *testing.T) { + d := L3SensorIngestion() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Deduper" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 4) + + // Verify Deduper Throughput panel + assert.Equal(t, "Deduper Throughput", row.Panels[0].Title) + assert.Equal(t, "timeseries", row.Panels[0].Type) + assert.Equal(t, 12, row.Panels[0].Width) + assert.Equal(t, `rate(rox_central_sensor_event_deduper[5m])`, row.Panels[0].Queries[0].Expr) + assert.Equal(t, `{{status}} - {{type}}`, row.Panels[0].Queries[0].LegendFormat) + + // Verify Deduper Hit Rate panel + assert.Equal(t, "Deduper Hit Rate", row.Panels[1].Title) + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, 12, row.Panels[1].Width) + assert.Contains(t, row.Panels[1].Queries[0].Expr, `rate(rox_central_sensor_event_deduper{status="deduplicated"}[5m])`) + assert.Equal(t, `dedup rate`, row.Panels[1].Queries[0].LegendFormat) + + // Verify Hash Store Size panel + assert.Equal(t, "Hash Store Size", row.Panels[2].Title) + assert.Equal(t, "timeseries", row.Panels[2].Type) + assert.Equal(t, 12, row.Panels[2].Width) + assert.Equal(t, `rox_central_deduping_hash_size`, row.Panels[2].Queries[0].Expr) + assert.Equal(t, `{{cluster}}`, row.Panels[2].Queries[0].LegendFormat) + + // Verify Hash Operations panel + assert.Equal(t, "Hash Operations", row.Panels[3].Title) + assert.Equal(t, "timeseries", row.Panels[3].Type) + assert.Equal(t, 12, row.Panels[3].Width) + assert.Equal(t, `rate(rox_central_deduping_hash_count[5m])`, row.Panels[3].Queries[0].Expr) + assert.Equal(t, `{{ResourceType}} - {{Operation}}`, row.Panels[3].Queries[0].LegendFormat) +} + +func TestL3SensorIngestion_WorkerQueueRow(t *testing.T) { + d := L3SensorIngestion() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Worker Queue" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 4, "Worker Queue should have 4 panels (2 metrics + 2 gaps)") + + // Verify Events Processed panel + assert.Equal(t, "Events Processed", row.Panels[0].Title) + assert.Equal(t, "timeseries", row.Panels[0].Type) + assert.Equal(t, 12, row.Panels[0].Width) + assert.Equal(t, `rate(rox_central_sensor_event_queue[5m])`, row.Panels[0].Queries[0].Expr) + assert.Equal(t, `{{Operation}} - {{Type}}`, row.Panels[0].Queries[0].LegendFormat) + + // Verify Processing Duration panel + assert.Equal(t, "Processing Duration", row.Panels[1].Title) + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, 12, row.Panels[1].Width) + assert.Equal(t, `histogram_quantile(0.95, rate(rox_central_sensor_event_duration_bucket[5m]))`, row.Panels[1].Queries[0].Expr) + assert.Equal(t, `p95 - {{Type}}`, row.Panels[1].Queries[0].LegendFormat) + + // Verify Queue Depth gap panel + assert.Equal(t, "GAP: Queue Depth", row.Panels[2].Title) + assert.Equal(t, 12, row.Panels[2].Width) + assert.Equal(t, 4, row.Panels[2].Height) + assert.Contains(t, row.Panels[2].GapNote, "central_sensor_ingestion_queue_depth") + assert.Contains(t, row.Panels[2].GapNote, "Cannot answer \"is the queue backing up?\"") + + // Verify In-Flight gap panel + assert.Equal(t, "GAP: In-Flight", row.Panels[3].Title) + assert.Equal(t, 12, row.Panels[3].Width) + assert.Equal(t, 4, row.Panels[3].Height) + assert.Contains(t, row.Panels[3].GapNote, "central_sensor_ingestion_in_flight") + assert.Contains(t, row.Panels[3].GapNote, "items currently being processed") +} + +func TestL3SensorIngestion_PipelineProcessingRow(t *testing.T) { + d := L3SensorIngestion() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Pipeline Processing" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 4, "Pipeline Processing should have 4 panels (3 metrics + 1 gap)") + + // Verify Resources Processed panel + assert.Equal(t, "Resources Processed", row.Panels[0].Title) + assert.Equal(t, "timeseries", row.Panels[0].Type) + assert.Equal(t, 12, row.Panels[0].Width) + assert.Equal(t, `rate(rox_central_resource_processed_count[5m])`, row.Panels[0].Queries[0].Expr) + assert.Equal(t, `{{Resource}} - {{Operation}}`, row.Panels[0].Queries[0].LegendFormat) + + // Verify Pipeline Panics panel + assert.Equal(t, "Pipeline Panics", row.Panels[1].Title) + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, 12, row.Panels[1].Width) + assert.Equal(t, `rate(rox_central_pipeline_panics[5m])`, row.Panels[1].Queries[0].Expr) + assert.Equal(t, `{{resource}}`, row.Panels[1].Queries[0].LegendFormat) + + // Verify K8s Event Processing panel + assert.Equal(t, "K8s Event Processing", row.Panels[2].Title) + assert.Equal(t, "timeseries", row.Panels[2].Type) + assert.Equal(t, 12, row.Panels[2].Width) + assert.Equal(t, `histogram_quantile(0.95, rate(rox_central_k8s_event_processing_duration_bucket[5m]))`, row.Panels[2].Queries[0].Expr) + assert.Equal(t, `p95 - {{Resource}}`, row.Panels[2].Queries[0].LegendFormat) + + // Verify Per-Fragment Metrics gap panel + assert.Equal(t, "GAP: Per-Fragment Metrics", row.Panels[3].Title) + assert.Equal(t, 12, row.Panels[3].Width) + assert.Equal(t, 4, row.Panels[3].Height) + assert.Contains(t, row.Panels[3].GapNote, "Per-fragment processing counts and durations") + assert.Contains(t, row.Panels[3].GapNote, "25 pipeline fragments exist") +} + +func TestL3SensorIngestion_MessagesNotSentRow(t *testing.T) { + d := L3SensorIngestion() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Messages Not Sent" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 1) + + // Verify Failed Sends to Sensor panel + assert.Equal(t, "Failed Sends to Sensor", row.Panels[0].Title) + assert.Equal(t, "timeseries", row.Panels[0].Type) + assert.Equal(t, 24, row.Panels[0].Width) + assert.Equal(t, `rate(rox_central_msg_to_sensor_not_sent_count[5m])`, row.Panels[0].Queries[0].Expr) + assert.Equal(t, `{{type}} - {{reason}}`, row.Panels[0].Queries[0].LegendFormat) +} + +func TestL3SensorIngestion_ProducesValidJSON(t *testing.T) { + d := L3SensorIngestion() + result := d.Generate() + + // Should marshal to valid JSON + b, err := json.Marshal(result) + require.NoError(t, err) + require.NotEmpty(t, b) + + // Should unmarshal back + var unmarshaled map[string]any + err = json.Unmarshal(b, &unmarshaled) + require.NoError(t, err) + + // Verify key fields survived round-trip + assert.Equal(t, "central-sensor-ingestion", unmarshaled["uid"]) + assert.Equal(t, "Central: Sensor Ingestion", unmarshaled["title"]) +} + +func TestL3SensorIngestion_AllPanelsHaveValidWidth(t *testing.T) { + d := L3SensorIngestion() + + for _, row := range d.Rows { + for _, panel := range row.Panels { + assert.Greater(t, panel.Width, 0, "Panel %s should have positive width", panel.Title) + assert.LessOrEqual(t, panel.Width, 24, "Panel %s width should not exceed 24", panel.Title) + } + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/l3_stubs.go b/deploy/charts/monitoring/dashboards/generator/l3_stubs.go new file mode 100644 index 0000000000000..138bb46e674da --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l3_stubs.go @@ -0,0 +1,608 @@ +package generator + +// L3Stubs creates all 8 remaining Level 3 stub dashboards for Central regions. +// Each stub has real panels where metrics exist and prominent gap annotations where they don't. +func L3Stubs() []Dashboard { + return []Dashboard{ + l3DeploymentProcessing(), + l3DetectionAlerts(), + l3RiskCalculation(), + l3BackgroundReprocessing(), + l3PruningGC(), + l3NetworkAnalysis(), + l3ReportGeneration(), + l3APIUI(), + } +} + +// l3DeploymentProcessing creates the "Central: Deployment Processing" dashboard. +func l3DeploymentProcessing() Dashboard { + return Dashboard{ + UID: "central-deployment-processing", + Title: "Central: Deployment Processing", + Tags: []string{"stackrox", "central", "level-3", "deployment-processing"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + { + Title: "Resource Processing", + Panels: []Panel{ + { + Title: "Resources/sec", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_resource_processed_count[5m])`, + LegendFormat: `{{Resource}} - {{Operation}}`, + RefID: "A", + }, + }, + }, + { + Title: "K8s Event Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_k8s_event_processing_duration_bucket[5m]))`, + LegendFormat: `p95 {{Resource}} - {{Action}}`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "Store Operations", + Panels: []Panel{ + { + Title: "Postgres Op Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_postgres_op_duration_bucket{Type=~"deployments|pods|namespaces"}[5m]))`, + LegendFormat: `p95`, + RefID: "A", + }, + }, + }, + { + Title: "GAP: Per-Fragment Handler Metrics", + Width: 12, + Height: 4, + GapNote: `**Metric Needed**: No per-fragment handler metrics. Cannot distinguish processing time for deployment vs pod vs namespace fragments.`, + }, + }, + }, + }, + } +} + +// l3DetectionAlerts creates the "Central: Detection & Alerts" dashboard. +func l3DetectionAlerts() Dashboard { + return Dashboard{ + UID: "central-detection-alerts", + Title: "Central: Detection & Alerts", + Tags: []string{"stackrox", "central", "level-3", "detection-alerts"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + { + Title: "Detection", + Panels: []Panel{ + { + Title: "Process Filter", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_process_filter`, + LegendFormat: `{{Type}}`, + RefID: "A", + }, + }, + }, + { + Title: "GAP: Alert Generation Rate", + Width: 12, + Height: 4, + GapNote: "**Metric Needed**: `central_detection_alerts_generated_total` — No alert generation rate metric. Cannot answer \"how many alerts are being generated?\"", + }, + }, + }, + { + Title: "Gaps", + Panels: []Panel{ + { + Title: "GAP: Lifecycle Manager Metrics", + Width: 24, + Height: 4, + GapNote: "**Metric Needed**: No lifecycle manager metrics. Need: `central_detection_lifecycle_duration_seconds`, `central_detection_baseline_evaluations_total`", + }, + }, + }, + }, + } +} + +// l3RiskCalculation creates the "Central: Risk Calculation" dashboard. +func l3RiskCalculation() Dashboard { + return Dashboard{ + UID: "central-risk-calculation", + Title: "Central: Risk Calculation", + Tags: []string{"stackrox", "central", "level-3", "risk-calculation"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + { + Title: "Risk Processing", + Panels: []Panel{ + { + Title: "Risk Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_risk_processing_duration`, + LegendFormat: `{{Risk_Reprocessor}}`, + RefID: "A", + }, + }, + }, + { + Title: "Reprocessor Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_reprocessor_duration_seconds`, + LegendFormat: `duration`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "Gaps", + Panels: []Panel{ + { + Title: "GAP: Items Processed", + Width: 24, + Height: 4, + GapNote: "**Metric Needed**: `central_risk_items_processed_total` — No items-processed counter. Cannot answer \"how many deployments had risk recalculated?\"", + }, + }, + }, + }, + } +} + +// l3BackgroundReprocessing creates the "Central: Background Reprocessing" dashboard. +func l3BackgroundReprocessing() Dashboard { + return Dashboard{ + UID: "central-background-reprocessing", + Title: "Central: Background Reprocessing", + Tags: []string{"stackrox", "central", "level-3", "background-reprocessing"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + { + Title: "Reprocessor Loops", + Panels: []Panel{ + { + Title: "Reprocessor Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_reprocessor_duration_seconds`, + LegendFormat: `duration`, + RefID: "A", + }, + }, + }, + { + Title: "Sig Verification Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_signature_verification_reprocessor_duration_seconds`, + LegendFormat: `duration`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "Gaps — Loop Instrumentation", + Panels: []Panel{ + { + Title: "GAP: Background Loop Metrics", + Width: 24, + Height: 6, + GapNote: "**Metric Needed**: 19+ background loops lack standard metrics. Need per-loop: `_running` (gauge), `_runs_total{result}` (counter), `_run_duration_seconds` (histogram), `_items_processed_total` (counter), `_last_run_timestamp_seconds` (gauge). Loops include: image-enrich, deployment-risk, active-components, pruning, CVE-suppress, CVE-fetch, indicator-flush, network-baseline-flush, hash-flush, conn-health, vuln-request, network-gatherer, and more.", + }, + }, + }, + }, + } +} + +// l3PruningGC creates the "Central: Pruning & GC" dashboard. +func l3PruningGC() Dashboard { + return Dashboard{ + UID: "central-pruning-gc", + Title: "Central: Pruning & GC", + Tags: []string{"stackrox", "central", "level-3", "pruning-gc"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + { + Title: "Pruning", + Panels: []Panel{ + { + Title: "Prune Duration", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_prune_duration`, + LegendFormat: `{{Type}}`, + RefID: "A", + }, + }, + }, + { + Title: "Process Queue Length", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_process_queue_length`, + LegendFormat: `queue length`, + RefID: "A", + }, + }, + }, + { + Title: "Pruned Indicators", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_pruned_process_indicators[5m])`, + LegendFormat: `pruned/sec`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "Additional Metrics", + Panels: []Panel{ + { + Title: "Orphaned PLOPs", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_orphaned_plop_total[5m])`, + LegendFormat: `{{ClusterID}}`, + RefID: "A", + }, + }, + }, + { + Title: "Cache Hits/Misses", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_process_pruning_cache_hits[5m])`, + LegendFormat: `hits`, + RefID: "A", + }, + { + Expr: `rate(rox_central_process_pruning_cache_misses[5m])`, + LegendFormat: `misses`, + RefID: "B", + }, + }, + }, + }, + }, + }, + } +} + +// l3NetworkAnalysis creates the "Central: Network Analysis" dashboard. +func l3NetworkAnalysis() Dashboard { + return Dashboard{ + UID: "central-network-analysis", + Title: "Central: Network Analysis", + Tags: []string{"stackrox", "central", "level-3", "network-analysis"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + { + Title: "Flows & Endpoints", + Panels: []Panel{ + { + Title: "Flows Received", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_total_network_flows_central_received_counter[5m])`, + LegendFormat: `{{ClusterID}}`, + RefID: "A", + }, + }, + }, + { + Title: "Endpoints Received", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_total_network_endpoints_received_counter[5m])`, + LegendFormat: `{{ClusterID}}`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "Gaps", + Panels: []Panel{ + { + Title: "GAP: Network Processing Pipeline", + Width: 24, + Height: 4, + GapNote: "**Metric Needed**: Network baseline flush, external network gatherer, and flow processing pipeline have no Central-side metrics. Need: `central_network_baseline_flush_duration_seconds`, `central_network_flows_processed_total{action}`", + }, + }, + }, + }, + } +} + +// l3ReportGeneration creates the "Central: Report Generation" dashboard. +func l3ReportGeneration() Dashboard { + return Dashboard{ + UID: "central-report-generation", + Title: "Central: Report Generation", + Tags: []string{"stackrox", "central", "level-3", "report-generation"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + { + Title: "Compliance Operator Reports", + Panels: []Panel{ + { + Title: "Scan Watchers", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_complianceoperator_scan_watchers_current`, + LegendFormat: `watchers`, + RefID: "A", + }, + }, + }, + { + Title: "Parallel Scans", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_complianceoperator_num_scans_running_in_parallel`, + LegendFormat: `parallel scans`, + RefID: "A", + }, + }, + }, + { + Title: "Watcher Active Time", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_complianceoperator_scan_watchers_active_time_minutes`, + LegendFormat: `active time`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "Watcher Outcomes", + Panels: []Panel{ + { + Title: "Finish Types", + Type: "timeseries", + Width: 24, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_complianceoperator_scan_watchers_finish_type_total[5m])`, + LegendFormat: `{{type}}`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "Gaps — Vulnerability Report Pipeline", + Panels: []Panel{ + { + Title: "GAP: Report Generation Pipeline", + Width: 24, + Height: 4, + GapNote: "**Metric Needed**: No metrics for Central's vulnerability report generation pipeline (report scheduler, PDF/CSV generation, email delivery). Need: `central_report_generation_total{type,result}`, `central_report_generation_duration_seconds`, `central_report_delivery_total{method,result}`", + }, + }, + }, + }, + } +} + +// l3APIUI creates the "Central: API & UI" dashboard. +func l3APIUI() Dashboard { + return Dashboard{ + UID: "central-api-ui", + Title: "Central: API & UI", + Tags: []string{"stackrox", "central", "level-3", "api-ui"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + { + Title: "GraphQL", + Panels: []Panel{ + { + Title: "Query Duration p95", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_graphql_query_duration_bucket[5m]))`, + LegendFormat: `{{Query}}`, + RefID: "A", + }, + }, + }, + { + Title: "Resolver Duration p95", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_graphql_op_duration_bucket[5m]))`, + LegendFormat: `{{Resolver}}`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "gRPC", + Panels: []Panel{ + { + Title: "gRPC Errors", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_grpc_error[5m])`, + LegendFormat: `{{Code}}`, + RefID: "A", + }, + }, + }, + { + Title: "Message Sizes", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_grpc_message_size_sent_bytes`, + LegendFormat: `{{Type}}`, + RefID: "A", + }, + }, + }, + }, + }, + { + Title: "Gaps", + Panels: []Panel{ + { + Title: "GAP: Per-Endpoint Metrics", + Width: 12, + Height: 4, + GapNote: "**Metric Needed**: No per-API-endpoint latency or error rate metrics. Need: `central_api_request_duration_seconds{method,endpoint}`, `central_api_requests_total{method,endpoint,status}`. Cannot answer \"which API endpoint is slow?\"", + }, + { + Title: "GAP: UI Page Load", + Width: 12, + Height: 4, + GapNote: "**Metric Needed**: No UI page load metrics. Need frontend instrumentation or backend per-page-load latency tracking.", + }, + }, + }, + }, + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/l3_stubs_test.go b/deploy/charts/monitoring/dashboards/generator/l3_stubs_test.go new file mode 100644 index 0000000000000..eb92981874fdd --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l3_stubs_test.go @@ -0,0 +1,307 @@ +package generator + +import ( + "encoding/json" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestL3Stubs_Returns8Dashboards(t *testing.T) { + stubs := L3Stubs() + assert.Len(t, stubs, 8, "L3Stubs should return exactly 8 dashboards") +} + +func TestL3Stubs_AllHaveLevel3Tag(t *testing.T) { + stubs := L3Stubs() + + for _, d := range stubs { + assert.Contains(t, d.Tags, "level-3", "Dashboard %s should have level-3 tag", d.UID) + assert.Contains(t, d.Tags, "stackrox", "Dashboard %s should have stackrox tag", d.UID) + assert.Contains(t, d.Tags, "central", "Dashboard %s should have central tag", d.UID) + } +} + +func TestL3Stubs_AllHaveBackLinkToCentralInternals(t *testing.T) { + stubs := L3Stubs() + + for _, d := range stubs { + require.Len(t, d.Links, 1, "Dashboard %s should have exactly 1 link", d.UID) + assert.Equal(t, "← Central Internals", d.Links[0].Title, "Dashboard %s link title", d.UID) + assert.Equal(t, "central-internals", d.Links[0].TargetUID, "Dashboard %s link target", d.UID) + } +} + +func TestL3Stubs_SpecificUIDs(t *testing.T) { + stubs := L3Stubs() + + expectedUIDs := []string{ + "central-deployment-processing", + "central-detection-alerts", + "central-risk-calculation", + "central-background-reprocessing", + "central-pruning-gc", + "central-network-analysis", + "central-report-generation", + "central-api-ui", + } + + actualUIDs := make([]string, len(stubs)) + for i, d := range stubs { + actualUIDs[i] = d.UID + } + + for _, expectedUID := range expectedUIDs { + assert.Contains(t, actualUIDs, expectedUID, "Expected UID %s to be present", expectedUID) + } +} + +func TestL3Stubs_AllProduceValidJSON(t *testing.T) { + stubs := L3Stubs() + + for _, d := range stubs { + t.Run(d.UID, func(t *testing.T) { + result := d.Generate() + + // Should marshal to valid JSON + b, err := json.Marshal(result) + require.NoError(t, err) + require.NotEmpty(t, b) + + // Should unmarshal back + var unmarshaled map[string]any + err = json.Unmarshal(b, &unmarshaled) + require.NoError(t, err) + + // Verify key fields survived round-trip + assert.Equal(t, d.UID, unmarshaled["uid"]) + assert.Equal(t, d.Title, unmarshaled["title"]) + }) + } +} + +func TestL3Stubs_DeploymentProcessing(t *testing.T) { + d := findDashboard(L3Stubs(), "central-deployment-processing") + require.NotNil(t, d, "central-deployment-processing dashboard should exist") + + assert.Equal(t, "Central: Deployment Processing", d.Title) + assert.Contains(t, d.Tags, "deployment-processing") + + // Should have 2 rows: Resource Processing, Store Operations + assert.Equal(t, 2, len(d.Rows)) + + // Resource Processing row + resourceRow := findRow(d.Rows, "Resource Processing") + require.NotNil(t, resourceRow) + require.Len(t, resourceRow.Panels, 2) // 2 metrics + + // Resources/sec panel + assert.Equal(t, "Resources/sec", resourceRow.Panels[0].Title) + assert.Equal(t, "timeseries", resourceRow.Panels[0].Type) + assert.Contains(t, resourceRow.Panels[0].Queries[0].Expr, "rox_central_resource_processed_count") + assert.Equal(t, "{{Resource}} - {{Operation}}", resourceRow.Panels[0].Queries[0].LegendFormat) + + // K8s Event Duration panel + assert.Equal(t, "K8s Event Duration", resourceRow.Panels[1].Title) + assert.Contains(t, resourceRow.Panels[1].Queries[0].Expr, "rox_central_k8s_event_processing_duration_bucket") + + // Store Operations row + storeRow := findRow(d.Rows, "Store Operations") + require.NotNil(t, storeRow) + require.Len(t, storeRow.Panels, 2) // 1 metric + 1 gap + + // Postgres Op Duration panel + assert.Equal(t, "Postgres Op Duration", storeRow.Panels[0].Title) + assert.Contains(t, storeRow.Panels[0].Queries[0].Expr, "rox_central_postgres_op_duration_bucket") + assert.Contains(t, storeRow.Panels[0].Queries[0].Expr, "Type=~\"deployments|pods|namespaces\"") + + // Gap panel + assert.Equal(t, "GAP: Per-Fragment Handler Metrics", storeRow.Panels[1].Title) + assert.Equal(t, 4, storeRow.Panels[1].Height) + assert.Contains(t, storeRow.Panels[1].GapNote, "per-fragment handler metrics") +} + +func TestL3Stubs_DetectionAlerts(t *testing.T) { + d := findDashboard(L3Stubs(), "central-detection-alerts") + require.NotNil(t, d) + + assert.Equal(t, "Central: Detection & Alerts", d.Title) + assert.Contains(t, d.Tags, "detection-alerts") + + // Should have 2 rows + assert.Equal(t, 2, len(d.Rows)) + + // Detection row + detectionRow := findRow(d.Rows, "Detection") + require.NotNil(t, detectionRow) + require.Len(t, detectionRow.Panels, 2) // 1 metric + 1 gap + + assert.Equal(t, "Process Filter", detectionRow.Panels[0].Title) + assert.Contains(t, detectionRow.Panels[0].Queries[0].Expr, "rox_central_process_filter") + + assert.Equal(t, "GAP: Alert Generation Rate", detectionRow.Panels[1].Title) + assert.Contains(t, detectionRow.Panels[1].GapNote, "central_detection_alerts_generated_total") +} + +func TestL3Stubs_RiskCalculation(t *testing.T) { + d := findDashboard(L3Stubs(), "central-risk-calculation") + require.NotNil(t, d) + + assert.Equal(t, "Central: Risk Calculation", d.Title) + assert.Contains(t, d.Tags, "risk-calculation") + + // Should have 2 rows + assert.Equal(t, 2, len(d.Rows)) + + // Risk Processing row + riskRow := findRow(d.Rows, "Risk Processing") + require.NotNil(t, riskRow) + require.Len(t, riskRow.Panels, 2) + + assert.Equal(t, "Risk Duration", riskRow.Panels[0].Title) + assert.Contains(t, riskRow.Panels[0].Queries[0].Expr, "rox_central_risk_processing_duration") + + assert.Equal(t, "Reprocessor Duration", riskRow.Panels[1].Title) + assert.Contains(t, riskRow.Panels[1].Queries[0].Expr, "rox_central_reprocessor_duration_seconds") +} + +func TestL3Stubs_BackgroundReprocessing(t *testing.T) { + d := findDashboard(L3Stubs(), "central-background-reprocessing") + require.NotNil(t, d) + + assert.Equal(t, "Central: Background Reprocessing", d.Title) + assert.Contains(t, d.Tags, "background-reprocessing") + + // Should have 2 rows + assert.Equal(t, 2, len(d.Rows)) + + // Gaps row + gapsRow := findRow(d.Rows, "Gaps — Loop Instrumentation") + require.NotNil(t, gapsRow) + require.Len(t, gapsRow.Panels, 1) + + // This gap is larger (Height=6) + assert.Equal(t, 6, gapsRow.Panels[0].Height) + assert.Contains(t, gapsRow.Panels[0].GapNote, "19+ background loops") +} + +func TestL3Stubs_PruningGC(t *testing.T) { + d := findDashboard(L3Stubs(), "central-pruning-gc") + require.NotNil(t, d) + + assert.Equal(t, "Central: Pruning & GC", d.Title) + assert.Contains(t, d.Tags, "pruning-gc") + + // Should have 2 rows + assert.Equal(t, 2, len(d.Rows)) + + // Pruning row + pruningRow := findRow(d.Rows, "Pruning") + require.NotNil(t, pruningRow) + require.Len(t, pruningRow.Panels, 3) + + assert.Equal(t, "Prune Duration", pruningRow.Panels[0].Title) + assert.Equal(t, 8, pruningRow.Panels[0].Width) + + // Additional Metrics row + additionalRow := findRow(d.Rows, "Additional Metrics") + require.NotNil(t, additionalRow) + require.Len(t, additionalRow.Panels, 2) + + // Cache Hits/Misses panel should have 2 queries + cachePanel := additionalRow.Panels[1] + assert.Equal(t, "Cache Hits/Misses", cachePanel.Title) + require.Len(t, cachePanel.Queries, 2) + assert.Equal(t, "hits", cachePanel.Queries[0].LegendFormat) + assert.Equal(t, "misses", cachePanel.Queries[1].LegendFormat) +} + +func TestL3Stubs_NetworkAnalysis(t *testing.T) { + d := findDashboard(L3Stubs(), "central-network-analysis") + require.NotNil(t, d) + + assert.Equal(t, "Central: Network Analysis", d.Title) + assert.Contains(t, d.Tags, "network-analysis") + + // Flows & Endpoints row + flowsRow := findRow(d.Rows, "Flows & Endpoints") + require.NotNil(t, flowsRow) + require.Len(t, flowsRow.Panels, 2) + + assert.Equal(t, "Flows Received", flowsRow.Panels[0].Title) + assert.Contains(t, flowsRow.Panels[0].Queries[0].Expr, "rox_central_total_network_flows_central_received_counter") +} + +func TestL3Stubs_ReportGeneration(t *testing.T) { + d := findDashboard(L3Stubs(), "central-report-generation") + require.NotNil(t, d) + + assert.Equal(t, "Central: Report Generation", d.Title) + assert.Contains(t, d.Tags, "report-generation") + + // Compliance Operator Reports row + complianceRow := findRow(d.Rows, "Compliance Operator Reports") + require.NotNil(t, complianceRow) + require.Len(t, complianceRow.Panels, 3) + + assert.Equal(t, "Scan Watchers", complianceRow.Panels[0].Title) + assert.Equal(t, 8, complianceRow.Panels[0].Width) +} + +func TestL3Stubs_APIUI(t *testing.T) { + d := findDashboard(L3Stubs(), "central-api-ui") + require.NotNil(t, d) + + assert.Equal(t, "Central: API & UI", d.Title) + assert.Contains(t, d.Tags, "api-ui") + + // GraphQL row + graphqlRow := findRow(d.Rows, "GraphQL") + require.NotNil(t, graphqlRow) + require.Len(t, graphqlRow.Panels, 2) + + assert.Equal(t, "Query Duration p95", graphqlRow.Panels[0].Title) + assert.Contains(t, graphqlRow.Panels[0].Queries[0].Expr, "rox_central_graphql_query_duration_bucket") + + // Gaps row + gapsRow := findRow(d.Rows, "Gaps") + require.NotNil(t, gapsRow) + require.Len(t, gapsRow.Panels, 2) + + assert.Equal(t, "GAP: Per-Endpoint Metrics", gapsRow.Panels[0].Title) + assert.Equal(t, "GAP: UI Page Load", gapsRow.Panels[1].Title) +} + +func TestL3Stubs_AllPanelsHaveValidDimensions(t *testing.T) { + stubs := L3Stubs() + + for _, d := range stubs { + for _, row := range d.Rows { + for _, panel := range row.Panels { + assert.Greater(t, panel.Width, 0, "Panel %s in %s should have positive width", panel.Title, d.UID) + assert.LessOrEqual(t, panel.Width, 24, "Panel %s in %s width should not exceed 24", panel.Title, d.UID) + assert.Greater(t, panel.Height, 0, "Panel %s in %s should have positive height", panel.Title, d.UID) + } + } + } +} + +// Helper functions +func findDashboard(dashboards []Dashboard, uid string) *Dashboard { + for i := range dashboards { + if dashboards[i].UID == uid { + return &dashboards[i] + } + } + return nil +} + +func findRow(rows []Row, title string) *Row { + for i := range rows { + if rows[i].Title == title { + return &rows[i] + } + } + return nil +} diff --git a/deploy/charts/monitoring/dashboards/generator/l3_vuln_enrichment.go b/deploy/charts/monitoring/dashboards/generator/l3_vuln_enrichment.go new file mode 100644 index 0000000000000..c2dea307a5e98 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l3_vuln_enrichment.go @@ -0,0 +1,250 @@ +package generator + +// L3VulnEnrichment creates the Level 3 "Central: Vulnerability Enrichment" detail dashboard. +// This dashboard provides deep visibility into the vulnerability enrichment pipeline including +// scan semaphores, image/node scanning performance, deduplication, and registry client metrics. +func L3VulnEnrichment() Dashboard { + return Dashboard{ + UID: "central-vuln-enrichment", + Title: "Central: Vulnerability Enrichment", + Tags: []string{"stackrox", "central", "level-3", "vulnerability-enrichment"}, + Links: []DashboardLink{ + { + Title: "← Central Internals", + TargetUID: "central-internals", + Type: "link", + }, + }, + Rows: []Row{ + scanSemaphoreRow(), + imageScanningRow(), + nodeScanningRow(), + imageDeduplicationRow(), + registryClientRow(), + }, + } +} + +func scanSemaphoreRow() Row { + return Row{ + Title: "Scan Semaphore", + Panels: []Panel{ + { + Title: "Scans In-Flight", + Type: "stat", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `sum(rox_image_scan_semaphore_holding_size)`, + RefID: "A", + }, + }, + }, + { + Title: "Semaphore Utilization", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rox_image_scan_semaphore_holding_size`, + LegendFormat: `holding`, + RefID: "A", + }, + { + Expr: `rox_image_scan_semaphore_limit`, + LegendFormat: `limit`, + RefID: "B", + }, + }, + }, + { + Title: "Queue Waiting", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rox_image_scan_semaphore_queue_size`, + LegendFormat: `{{subsystem}} - {{entity}}`, + RefID: "A", + }, + }, + }, + }, + } +} + +func imageScanningRow() Row { + return Row{ + Title: "Image Scanning", + Panels: []Panel{ + { + Title: "Scan Duration p50/p95/p99", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.5, rate(rox_central_scan_duration_bucket[5m]))`, + LegendFormat: `p50`, + RefID: "A", + }, + { + Expr: `histogram_quantile(0.95, rate(rox_central_scan_duration_bucket[5m]))`, + LegendFormat: `p95`, + RefID: "B", + }, + { + Expr: `histogram_quantile(0.99, rate(rox_central_scan_duration_bucket[5m]))`, + LegendFormat: `p99`, + RefID: "C", + }, + }, + }, + { + Title: "Vuln Retrieval Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_image_vuln_retrieval_duration_bucket[5m]))`, + LegendFormat: `p95`, + RefID: "A", + }, + }, + }, + { + Title: "Metadata Cache Hit Rate", + Type: "timeseries", + Width: 12, + Height: 8, + Unit: "percentunit", + Queries: []Query{ + { + Expr: `rate(rox_central_metadata_cache_hits[5m]) / (rate(rox_central_metadata_cache_hits[5m]) + rate(rox_central_metadata_cache_misses[5m]))`, + LegendFormat: `hit rate`, + RefID: "A", + }, + }, + }, + { + Title: "GAP: Enrichment Calls", + Width: 12, + Height: 4, + GapNote: `**Metric Needed**: ` + "`central_vuln_enrichment_requests_total{type,result}`" + ` — No counter for total enrichment requests (inline vs background). Cannot calculate enrichment failure rate.`, + }, + }, + } +} + +func nodeScanningRow() Row { + return Row{ + Title: "Node Scanning", + Panels: []Panel{ + { + Title: "Node Scan Duration", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_node_scan_duration_bucket[5m]))`, + LegendFormat: `p95`, + RefID: "A", + }, + }, + }, + { + Title: "GAP: Node Scan Count", + Width: 12, + Height: 4, + GapNote: `**Metric Needed**: ` + "`central_vuln_enrichment_node_scans_total{result}`" + ` — No counter for total node scans.`, + }, + }, + } +} + +func imageDeduplicationRow() Row { + return Row{ + Title: "Image Deduplication", + Panels: []Panel{ + { + Title: "Image Upsert Deduper", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_image_upsert_deduper[5m])`, + LegendFormat: `{{status}}`, + RefID: "A", + }, + }, + }, + { + Title: "Deployment Enhancement", + Type: "timeseries", + Width: 12, + Height: 8, + Queries: []Query{ + { + Expr: `rox_central_deployment_enhancement_duration_ms`, + LegendFormat: `duration`, + RefID: "A", + }, + }, + }, + }, + } +} + +func registryClientRow() Row { + return Row{ + Title: "Registry Client", + Panels: []Panel{ + { + Title: "Registry Requests", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_registry_client_requests_total[5m])`, + LegendFormat: `{{code}} - {{type}}`, + RefID: "A", + }, + }, + }, + { + Title: "Registry Latency", + Type: "timeseries", + Width: 8, + Height: 8, + Unit: "s", + Queries: []Query{ + { + Expr: `histogram_quantile(0.95, rate(rox_central_registry_client_request_duration_seconds_bucket[5m]))`, + LegendFormat: `p95`, + RefID: "A", + }, + }, + }, + { + Title: "Registry Timeouts", + Type: "timeseries", + Width: 8, + Height: 8, + Queries: []Query{ + { + Expr: `rate(rox_central_registry_client_error_timeouts_total[5m])`, + LegendFormat: `timeouts`, + RefID: "A", + }, + }, + }, + }, + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/l3_vuln_enrichment_test.go b/deploy/charts/monitoring/dashboards/generator/l3_vuln_enrichment_test.go new file mode 100644 index 0000000000000..5cda73a827d2b --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/l3_vuln_enrichment_test.go @@ -0,0 +1,258 @@ +package generator + +import ( + "encoding/json" + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestL3VulnEnrichment_BasicMetadata(t *testing.T) { + d := L3VulnEnrichment() + + assert.Equal(t, "central-vuln-enrichment", d.UID) + assert.Equal(t, "Central: Vulnerability Enrichment", d.Title) + assert.Contains(t, d.Tags, "level-3") + assert.Contains(t, d.Tags, "vulnerability-enrichment") + assert.Contains(t, d.Tags, "stackrox") + assert.Contains(t, d.Tags, "central") +} + +func TestL3VulnEnrichment_HasBackLinkToCentralInternals(t *testing.T) { + d := L3VulnEnrichment() + + require.Len(t, d.Links, 1) + assert.Equal(t, "← Central Internals", d.Links[0].Title) + assert.Equal(t, "central-internals", d.Links[0].TargetUID) +} + +func TestL3VulnEnrichment_HasRequiredRows(t *testing.T) { + d := L3VulnEnrichment() + + require.Equal(t, 5, len(d.Rows), "Should have exactly 5 rows") + + // Verify row titles + rowTitles := make([]string, len(d.Rows)) + for i, row := range d.Rows { + rowTitles[i] = row.Title + } + + assert.Contains(t, rowTitles, "Scan Semaphore") + assert.Contains(t, rowTitles, "Image Scanning") + assert.Contains(t, rowTitles, "Node Scanning") + assert.Contains(t, rowTitles, "Image Deduplication") + assert.Contains(t, rowTitles, "Registry Client") +} + +func TestL3VulnEnrichment_ScanSemaphoreRow(t *testing.T) { + d := L3VulnEnrichment() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Scan Semaphore" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 3) + + // Verify Scans In-Flight panel + assert.Equal(t, "Scans In-Flight", row.Panels[0].Title) + assert.Equal(t, "stat", row.Panels[0].Type) + assert.Equal(t, 8, row.Panels[0].Width) + assert.Equal(t, `sum(rox_image_scan_semaphore_holding_size)`, row.Panels[0].Queries[0].Expr) + + // Verify Semaphore Utilization panel - should have 2 queries + assert.Equal(t, "Semaphore Utilization", row.Panels[1].Title) + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, 8, row.Panels[1].Width) + require.Len(t, row.Panels[1].Queries, 2, "Semaphore Utilization should have 2 queries") + assert.Equal(t, `rox_image_scan_semaphore_holding_size`, row.Panels[1].Queries[0].Expr) + assert.Equal(t, `holding`, row.Panels[1].Queries[0].LegendFormat) + assert.Equal(t, `rox_image_scan_semaphore_limit`, row.Panels[1].Queries[1].Expr) + assert.Equal(t, `limit`, row.Panels[1].Queries[1].LegendFormat) + + // Verify Queue Waiting panel + assert.Equal(t, "Queue Waiting", row.Panels[2].Title) + assert.Equal(t, "timeseries", row.Panels[2].Type) + assert.Equal(t, 8, row.Panels[2].Width) + assert.Equal(t, `rox_image_scan_semaphore_queue_size`, row.Panels[2].Queries[0].Expr) + assert.Equal(t, `{{subsystem}} - {{entity}}`, row.Panels[2].Queries[0].LegendFormat) +} + +func TestL3VulnEnrichment_ImageScanningRow(t *testing.T) { + d := L3VulnEnrichment() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Image Scanning" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 4, "Image Scanning should have 4 panels (3 metrics + 1 gap)") + + // Verify Scan Duration p50/p95/p99 panel - should have 3 queries + assert.Equal(t, "Scan Duration p50/p95/p99", row.Panels[0].Title) + assert.Equal(t, "timeseries", row.Panels[0].Type) + assert.Equal(t, 12, row.Panels[0].Width) + require.Len(t, row.Panels[0].Queries, 3, "Scan Duration should have 3 queries") + assert.Equal(t, `histogram_quantile(0.5, rate(rox_central_scan_duration_bucket[5m]))`, row.Panels[0].Queries[0].Expr) + assert.Equal(t, `p50`, row.Panels[0].Queries[0].LegendFormat) + assert.Equal(t, `histogram_quantile(0.95, rate(rox_central_scan_duration_bucket[5m]))`, row.Panels[0].Queries[1].Expr) + assert.Equal(t, `p95`, row.Panels[0].Queries[1].LegendFormat) + assert.Equal(t, `histogram_quantile(0.99, rate(rox_central_scan_duration_bucket[5m]))`, row.Panels[0].Queries[2].Expr) + assert.Equal(t, `p99`, row.Panels[0].Queries[2].LegendFormat) + + // Verify Vuln Retrieval Duration panel + assert.Equal(t, "Vuln Retrieval Duration", row.Panels[1].Title) + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, 12, row.Panels[1].Width) + assert.Equal(t, `histogram_quantile(0.95, rate(rox_central_image_vuln_retrieval_duration_bucket[5m]))`, row.Panels[1].Queries[0].Expr) + assert.Equal(t, `p95`, row.Panels[1].Queries[0].LegendFormat) + + // Verify Metadata Cache Hit Rate panel + assert.Equal(t, "Metadata Cache Hit Rate", row.Panels[2].Title) + assert.Equal(t, "timeseries", row.Panels[2].Type) + assert.Equal(t, 12, row.Panels[2].Width) + assert.Equal(t, `rate(rox_central_metadata_cache_hits[5m]) / (rate(rox_central_metadata_cache_hits[5m]) + rate(rox_central_metadata_cache_misses[5m]))`, row.Panels[2].Queries[0].Expr) + assert.Equal(t, `hit rate`, row.Panels[2].Queries[0].LegendFormat) + assert.Equal(t, "percentunit", row.Panels[2].Unit) + + // Verify Enrichment Calls gap panel + assert.Equal(t, "GAP: Enrichment Calls", row.Panels[3].Title) + assert.Equal(t, 12, row.Panels[3].Width) + assert.Equal(t, 4, row.Panels[3].Height) + assert.Contains(t, row.Panels[3].GapNote, "central_vuln_enrichment_requests_total") + assert.Contains(t, row.Panels[3].GapNote, "Cannot calculate enrichment failure rate") +} + +func TestL3VulnEnrichment_NodeScanningRow(t *testing.T) { + d := L3VulnEnrichment() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Node Scanning" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 2, "Node Scanning should have 2 panels (1 metric + 1 gap)") + + // Verify Node Scan Duration panel + assert.Equal(t, "Node Scan Duration", row.Panels[0].Title) + assert.Equal(t, "timeseries", row.Panels[0].Type) + assert.Equal(t, 12, row.Panels[0].Width) + assert.Equal(t, `histogram_quantile(0.95, rate(rox_central_node_scan_duration_bucket[5m]))`, row.Panels[0].Queries[0].Expr) + assert.Equal(t, `p95`, row.Panels[0].Queries[0].LegendFormat) + + // Verify Node Scan Count gap panel + assert.Equal(t, "GAP: Node Scan Count", row.Panels[1].Title) + assert.Equal(t, 12, row.Panels[1].Width) + assert.Equal(t, 4, row.Panels[1].Height) + assert.Contains(t, row.Panels[1].GapNote, "central_vuln_enrichment_node_scans_total") + assert.Contains(t, row.Panels[1].GapNote, "No counter for total node scans") +} + +func TestL3VulnEnrichment_ImageDeduplicationRow(t *testing.T) { + d := L3VulnEnrichment() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Image Deduplication" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 2) + + // Verify Image Upsert Deduper panel + assert.Equal(t, "Image Upsert Deduper", row.Panels[0].Title) + assert.Equal(t, "timeseries", row.Panels[0].Type) + assert.Equal(t, 12, row.Panels[0].Width) + assert.Equal(t, `rate(rox_central_image_upsert_deduper[5m])`, row.Panels[0].Queries[0].Expr) + assert.Equal(t, `{{status}}`, row.Panels[0].Queries[0].LegendFormat) + + // Verify Deployment Enhancement panel + assert.Equal(t, "Deployment Enhancement", row.Panels[1].Title) + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, 12, row.Panels[1].Width) + assert.Equal(t, `rox_central_deployment_enhancement_duration_ms`, row.Panels[1].Queries[0].Expr) + assert.Equal(t, `duration`, row.Panels[1].Queries[0].LegendFormat) +} + +func TestL3VulnEnrichment_RegistryClientRow(t *testing.T) { + d := L3VulnEnrichment() + + var row *Row + for i := range d.Rows { + if d.Rows[i].Title == "Registry Client" { + row = &d.Rows[i] + break + } + } + + require.NotNil(t, row) + require.Len(t, row.Panels, 3) + + // Verify Registry Requests panel + assert.Equal(t, "Registry Requests", row.Panels[0].Title) + assert.Equal(t, "timeseries", row.Panels[0].Type) + assert.Equal(t, 8, row.Panels[0].Width) + assert.Equal(t, `rate(rox_central_registry_client_requests_total[5m])`, row.Panels[0].Queries[0].Expr) + assert.Equal(t, `{{code}} - {{type}}`, row.Panels[0].Queries[0].LegendFormat) + + // Verify Registry Latency panel + assert.Equal(t, "Registry Latency", row.Panels[1].Title) + assert.Equal(t, "timeseries", row.Panels[1].Type) + assert.Equal(t, 8, row.Panels[1].Width) + assert.Equal(t, `histogram_quantile(0.95, rate(rox_central_registry_client_request_duration_seconds_bucket[5m]))`, row.Panels[1].Queries[0].Expr) + assert.Equal(t, `p95`, row.Panels[1].Queries[0].LegendFormat) + assert.Equal(t, "s", row.Panels[1].Unit) + + // Verify Registry Timeouts panel + assert.Equal(t, "Registry Timeouts", row.Panels[2].Title) + assert.Equal(t, "timeseries", row.Panels[2].Type) + assert.Equal(t, 8, row.Panels[2].Width) + assert.Equal(t, `rate(rox_central_registry_client_error_timeouts_total[5m])`, row.Panels[2].Queries[0].Expr) + assert.Equal(t, `timeouts`, row.Panels[2].Queries[0].LegendFormat) +} + +func TestL3VulnEnrichment_ProducesValidJSON(t *testing.T) { + d := L3VulnEnrichment() + result := d.Generate() + + // Should marshal to valid JSON + b, err := json.Marshal(result) + require.NoError(t, err) + require.NotEmpty(t, b) + + // Should unmarshal back + var unmarshaled map[string]any + err = json.Unmarshal(b, &unmarshaled) + require.NoError(t, err) + + // Verify key fields survived round-trip + assert.Equal(t, "central-vuln-enrichment", unmarshaled["uid"]) + assert.Equal(t, "Central: Vulnerability Enrichment", unmarshaled["title"]) +} + +func TestL3VulnEnrichment_AllPanelsHaveValidWidth(t *testing.T) { + d := L3VulnEnrichment() + + for _, row := range d.Rows { + for _, panel := range row.Panels { + assert.Greater(t, panel.Width, 0, "Panel %s should have positive width", panel.Title) + assert.LessOrEqual(t, panel.Width, 24, "Panel %s width should not exceed 24", panel.Title) + } + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/panel.go b/deploy/charts/monitoring/dashboards/generator/panel.go new file mode 100644 index 0000000000000..0d952ec2fcbc9 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/panel.go @@ -0,0 +1,257 @@ +package generator + +// Panel types +type Panel struct { + Title string + Description string + Width int // out of 24 + Height int // grid units, typically 8 + Type string // "timeseries", "stat", "gauge", "text", "table" + Queries []Query + Unit string // "short", "s", "bytes", "percentunit", "ops", etc. + Thresholds []Threshold + GapNote string // non-empty = this is a gap annotation panel +} + +// Query represents a Prometheus query +type Query struct { + Expr string + LegendFormat string + RefID string +} + +// Threshold represents a threshold configuration +type Threshold struct { + Value float64 + Color string // "green", "yellow", "red" +} + +const datasourceUID = "PBFA97CFB590B2093" + +// generate creates a Grafana panel JSON structure +func (p *Panel) generate(id, x, y int) map[string]any { + // If this is a gap annotation, render as text panel + if p.GapNote != "" { + return p.generateGapPanel(id, x, y) + } + + panel := map[string]any{ + "datasource": map[string]any{ + "type": "prometheus", + "uid": datasourceUID, + }, + "fieldConfig": p.generateFieldConfig(), + "gridPos": map[string]int{ + "h": p.Height, + "w": p.Width, + "x": x, + "y": y, + }, + "id": id, + "title": p.Title, + "type": p.Type, + } + + // Add description if present + if p.Description != "" { + panel["description"] = p.Description + } + + // Add targets (queries) + if len(p.Queries) > 0 { + panel["targets"] = p.generateTargets() + } + + // Add type-specific options + switch p.Type { + case "timeseries": + panel["options"] = p.generateTimeseriesOptions() + case "stat": + panel["options"] = p.generateStatOptions() + case "gauge": + panel["options"] = p.generateGaugeOptions() + case "table": + panel["options"] = p.generateTableOptions() + } + + return panel +} + +func (p *Panel) generateGapPanel(id, x, y int) map[string]any { + content := "⚠️ " + p.GapNote + + return map[string]any{ + "datasource": map[string]any{ + "type": "datasource", + "uid": "grafana", + }, + "gridPos": map[string]int{ + "h": p.Height, + "w": p.Width, + "x": x, + "y": y, + }, + "id": id, + "title": p.Title, + "type": "text", + "options": map[string]any{ + "mode": "markdown", + "content": content, + }, + } +} + +func (p *Panel) generateFieldConfig() map[string]any { + defaults := map[string]any{ + "color": map[string]any{ + "mode": "palette-classic", + }, + "custom": map[string]any{ + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": map[string]any{ + "tooltip": false, + "viz": false, + "legend": false, + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": map[string]any{ + "type": "linear", + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": map[string]any{ + "group": "A", + "mode": "none", + }, + "thresholdsStyle": map[string]any{ + "mode": "off", + }, + }, + "mappings": []any{}, + } + + // Add unit if specified + if p.Unit != "" { + defaults["unit"] = p.Unit + } + + // Add thresholds if specified + if len(p.Thresholds) > 0 { + defaults["thresholds"] = p.generateThresholds() + } else { + // Default thresholds + defaults["thresholds"] = map[string]any{ + "mode": "absolute", + "steps": []map[string]any{ + {"color": "green", "value": nil}, + }, + } + } + + return map[string]any{ + "defaults": defaults, + "overrides": []any{}, + } +} + +func (p *Panel) generateThresholds() map[string]any { + steps := []map[string]any{ + {"color": "green", "value": nil}, + } + + for _, th := range p.Thresholds { + steps = append(steps, map[string]any{ + "color": th.Color, + "value": th.Value, + }) + } + + return map[string]any{ + "mode": "absolute", + "steps": steps, + } +} + +func (p *Panel) generateTargets() []map[string]any { + targets := make([]map[string]any, 0, len(p.Queries)) + + for _, q := range p.Queries { + target := map[string]any{ + "datasource": map[string]any{ + "type": "prometheus", + "uid": datasourceUID, + }, + "editorMode": "code", + "expr": q.Expr, + "instant": false, + "legendFormat": q.LegendFormat, + "range": true, + "refId": q.RefID, + } + targets = append(targets, target) + } + + return targets +} + +func (p *Panel) generateTimeseriesOptions() map[string]any { + return map[string]any{ + "legend": map[string]any{ + "calcs": []string{}, + "displayMode": "list", + "placement": "bottom", + "showLegend": true, + }, + "tooltip": map[string]any{ + "mode": "single", + "sort": "none", + }, + } +} + +func (p *Panel) generateStatOptions() map[string]any { + return map[string]any{ + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": map[string]any{ + "values": false, + "calcs": []string{"lastNotNull"}, + }, + "textMode": "auto", + } +} + +func (p *Panel) generateGaugeOptions() map[string]any { + return map[string]any{ + "orientation": "auto", + "reduceOptions": map[string]any{ + "values": false, + "calcs": []string{"lastNotNull"}, + }, + "showThresholdLabels": false, + "showThresholdMarkers": true, + } +} + +func (p *Panel) generateTableOptions() map[string]any { + return map[string]any{ + "showHeader": true, + "footer": map[string]any{ + "show": false, + "reducer": []string{"sum"}, + "countRows": false, + "enablePagination": false, + }, + } +} diff --git a/deploy/charts/monitoring/dashboards/generator/panel_test.go b/deploy/charts/monitoring/dashboards/generator/panel_test.go new file mode 100644 index 0000000000000..368ea6bd9ef07 --- /dev/null +++ b/deploy/charts/monitoring/dashboards/generator/panel_test.go @@ -0,0 +1,193 @@ +package generator + +import ( + "testing" + + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" +) + +func TestPanel_Timeseries(t *testing.T) { + p := Panel{ + Title: "Events per Second", + Width: 12, + Height: 8, + Type: "timeseries", + Unit: "ops", + Queries: []Query{ + {Expr: `rate(rox_central_sensor_event_queue{Operation="remove"}[5m])`, LegendFormat: "{{Type}}", RefID: "A"}, + }, + } + + result := p.generate(1, 0, 0) // id=1, x=0, y=0 + + assert.Equal(t, "Events per Second", result["title"]) + assert.Equal(t, "timeseries", result["type"]) + assert.Equal(t, 1, result["id"]) + + gridPos := result["gridPos"].(map[string]int) + assert.Equal(t, 12, gridPos["w"]) + assert.Equal(t, 8, gridPos["h"]) + assert.Equal(t, 0, gridPos["x"]) + assert.Equal(t, 0, gridPos["y"]) + + targets, ok := result["targets"].([]map[string]any) + require.True(t, ok) + require.Len(t, targets, 1) + assert.Contains(t, targets[0]["expr"], "rox_central_sensor_event_queue") +} + +func TestPanel_GapAnnotation(t *testing.T) { + p := Panel{ + Title: "Queue Depth", + Width: 12, + Height: 4, + GapNote: "**Metric Needed**: `central_sensor_ingestion_queue_depth` — Worker Queue shards lack depth gauges.", + } + + result := p.generate(2, 0, 8) + + assert.Equal(t, "text", result["type"]) + options := result["options"].(map[string]any) + assert.Contains(t, options["content"], "Metric Needed") + assert.Equal(t, "markdown", options["mode"]) +} + +func TestPanel_Stat(t *testing.T) { + p := Panel{ + Title: "Sensors Connected", + Width: 4, + Height: 4, + Type: "stat", + Queries: []Query{ + {Expr: `count(rox_central_sensor_connected{connection_state="connected"})`, RefID: "A"}, + }, + } + + result := p.generate(3, 0, 0) + assert.Equal(t, "stat", result["type"]) +} + +func TestPanel_Gauge(t *testing.T) { + p := Panel{ + Title: "CPU Usage", + Width: 6, + Height: 6, + Type: "gauge", + Unit: "percentunit", + Queries: []Query{ + {Expr: `rate(process_cpu_seconds_total[5m])`, RefID: "A"}, + }, + Thresholds: []Threshold{ + {Value: 0.7, Color: "yellow"}, + {Value: 0.9, Color: "red"}, + }, + } + + result := p.generate(4, 0, 0) + assert.Equal(t, "gauge", result["type"]) + assert.Equal(t, "percentunit", result["fieldConfig"].(map[string]any)["defaults"].(map[string]any)["unit"]) + + // Check thresholds + defaults := result["fieldConfig"].(map[string]any)["defaults"].(map[string]any) + thresholds := defaults["thresholds"].(map[string]any) + steps := thresholds["steps"].([]map[string]any) + require.Len(t, steps, 3) // base green + 2 thresholds + + assert.Equal(t, "green", steps[0]["color"]) + assert.Equal(t, "yellow", steps[1]["color"]) + assert.Equal(t, 0.7, steps[1]["value"]) + assert.Equal(t, "red", steps[2]["color"]) + assert.Equal(t, 0.9, steps[2]["value"]) +} + +func TestPanel_MultipleQueries(t *testing.T) { + p := Panel{ + Title: "Multiple Metrics", + Width: 12, + Height: 8, + Type: "timeseries", + Queries: []Query{ + {Expr: `metric_a`, LegendFormat: "A", RefID: "A"}, + {Expr: `metric_b`, LegendFormat: "B", RefID: "B"}, + {Expr: `metric_c`, LegendFormat: "C", RefID: "C"}, + }, + } + + result := p.generate(5, 0, 0) + targets, ok := result["targets"].([]map[string]any) + require.True(t, ok) + require.Len(t, targets, 3) + + assert.Equal(t, "A", targets[0]["refId"]) + assert.Equal(t, "B", targets[1]["refId"]) + assert.Equal(t, "C", targets[2]["refId"]) +} + +func TestRow_Generate(t *testing.T) { + d := Dashboard{ + UID: "test", + Title: "Test", + Rows: []Row{ + { + Title: "Sensor Ingestion", + Panels: []Panel{ + {Title: "P1", Width: 12, Height: 8, Type: "timeseries"}, + {Title: "P2", Width: 12, Height: 8, Type: "timeseries"}, + }, + }, + }, + } + + result := d.Generate() + panels, ok := result["panels"].([]map[string]any) + require.True(t, ok) + + // Row header + 2 panels = 3 + require.Len(t, panels, 3) + assert.Equal(t, "row", panels[0]["type"]) + assert.Equal(t, "P1", panels[1]["title"]) + assert.Equal(t, "P2", panels[2]["title"]) + + // P2 should be at x=12 (next to P1) + p2Grid := panels[2]["gridPos"].(map[string]int) + assert.Equal(t, 12, p2Grid["x"]) +} + +func TestRow_PanelWrapping(t *testing.T) { + d := Dashboard{ + UID: "test", + Title: "Test", + Rows: []Row{ + { + Title: "Test Row", + Panels: []Panel{ + {Title: "P1", Width: 12, Height: 8, Type: "timeseries"}, + {Title: "P2", Width: 12, Height: 8, Type: "timeseries"}, + {Title: "P3", Width: 12, Height: 8, Type: "timeseries"}, + }, + }, + }, + } + + result := d.Generate() + panels, ok := result["panels"].([]map[string]any) + require.True(t, ok) + + // Row header + 3 panels = 4 + require.Len(t, panels, 4) + + // P1 at x=0, y should be after row header + p1Grid := panels[1]["gridPos"].(map[string]int) + assert.Equal(t, 0, p1Grid["x"]) + + // P2 at x=12 (wraps because 12+12 > 24) + p2Grid := panels[2]["gridPos"].(map[string]int) + assert.Equal(t, 12, p2Grid["x"]) + assert.Equal(t, p1Grid["y"], p2Grid["y"]) // Same row + + // P3 should wrap to next line + p3Grid := panels[3]["gridPos"].(map[string]int) + assert.Equal(t, 0, p3Grid["x"]) + assert.Equal(t, p1Grid["y"]+8, p3Grid["y"]) // New row, y increments by height +} diff --git a/deploy/charts/monitoring/dashboards/stackrox-overview.json b/deploy/charts/monitoring/dashboards/stackrox-overview.json new file mode 100644 index 0000000000000..626340b7e419e --- /dev/null +++ b/deploy/charts/monitoring/dashboards/stackrox-overview.json @@ -0,0 +1,1272 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations \u0026 Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "links": [ + { + "asDropdown": false, + "icon": "external link", + "includeVars": false, + "keepTime": true, + "tags": [], + "targetBlank": false, + "title": "Central Internals", + "tooltip": "", + "type": "link", + "url": "/d/central-internals" + } + ], + "panels": [ + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 0 + }, + "id": 1, + "panels": [], + "title": "Service Health", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 4, + "x": 0, + "y": 1 + }, + "id": 2, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "up{job=\"central\"}", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Central Up", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 5, + "x": 4, + "y": 1 + }, + "id": 3, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rate(process_cpu_seconds_total{job=\"central\"}[5m])", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Central CPU", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 5, + "x": 9, + "y": 1 + }, + "id": 4, + "options": { + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "mode": "single", + "sort": "none" + } + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "process_resident_memory_bytes{job=\"central\"}", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Central Memory", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 5, + "x": 14, + "y": 1 + }, + "id": 5, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "go_goroutines{job=\"central\"}", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Central Goroutines", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 5, + "x": 19, + "y": 1 + }, + "id": 6, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_info", + "instant": false, + "legendFormat": "{{central_version}}", + "range": true, + "refId": "A" + } + ], + "title": "Central Version", + "type": "stat" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 9 + }, + "id": 7, + "panels": [], + "title": "Connected Sensors", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 0, + "y": 10 + }, + "id": 8, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "count by (connection_state) (rox_central_sensor_connected)", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Sensors Connected", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 6, + "y": 10 + }, + "id": 9, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_secured_clusters", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Secured Clusters", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 12, + "y": 10 + }, + "id": 10, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_secured_nodes", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Secured Nodes", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 18, + "y": 10 + }, + "id": 11, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_secured_vcpus", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Secured vCPUs", + "type": "stat" + }, + { + "collapsed": false, + "datasource": { + "type": "datasource", + "uid": "grafana" + }, + "gridPos": { + "h": 1, + "w": 24, + "x": 0, + "y": 18 + }, + "id": 12, + "panels": [], + "title": "Database", + "type": "row" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 0, + "y": 19 + }, + "id": 13, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_postgres_connected", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Postgres Connected", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "bytes" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 6, + "y": 19 + }, + "id": 14, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_postgres_total_size_bytes", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "DB Size", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + } + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 12, + "y": 19 + }, + "id": 15, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_postgres_total_connections{state=\"active\"}", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Active Connections", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "drawStyle": "line", + "fillOpacity": 0, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "auto", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "bytes" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 18, + "y": 19 + }, + "id": 16, + "options": { + "colorMode": "value", + "graphMode": "none", + "justifyMode": "auto", + "orientation": "auto", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "values": false + }, + "textMode": "auto" + }, + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "PBFA97CFB590B2093" + }, + "editorMode": "code", + "expr": "rox_central_postgres_available_size_bytes", + "instant": false, + "legendFormat": "", + "range": true, + "refId": "A" + } + ], + "title": "Available Space", + "type": "stat" + } + ], + "refresh": "30s", + "schemaVersion": 27, + "tags": [ + "stackrox", + "overview", + "level-1" + ], + "time": { + "from": "now-6h", + "to": "now" + }, + "timepicker": {}, + "timezone": "", + "title": "StackRox Overview", + "uid": "stackrox-overview", + "version": 0 +} \ No newline at end of file diff --git a/deploy/charts/monitoring/templates/grafana.yaml b/deploy/charts/monitoring/templates/grafana.yaml index 4cbdf1a5379b4..42f9c21df8e48 100644 --- a/deploy/charts/monitoring/templates/grafana.yaml +++ b/deploy/charts/monitoring/templates/grafana.yaml @@ -91,3 +91,27 @@ data: {{ .Files.Get "dashboards/enrichment-endpoints.json" | indent 4 }} enrichment-updatecomputer.json: |- {{ .Files.Get "dashboards/enrichment-updatecomputer.json" | indent 4 }} + stackrox-overview.json: |- +{{ .Files.Get "dashboards/stackrox-overview.json" | indent 4 }} + central-internals.json: |- +{{ .Files.Get "dashboards/central-internals.json" | indent 4 }} + central-sensor-ingestion.json: |- +{{ .Files.Get "dashboards/central-sensor-ingestion.json" | indent 4 }} + central-vuln-enrichment.json: |- +{{ .Files.Get "dashboards/central-vuln-enrichment.json" | indent 4 }} + central-deployment-processing.json: |- +{{ .Files.Get "dashboards/central-deployment-processing.json" | indent 4 }} + central-detection-alerts.json: |- +{{ .Files.Get "dashboards/central-detection-alerts.json" | indent 4 }} + central-risk-calculation.json: |- +{{ .Files.Get "dashboards/central-risk-calculation.json" | indent 4 }} + central-background-reprocessing.json: |- +{{ .Files.Get "dashboards/central-background-reprocessing.json" | indent 4 }} + central-pruning-gc.json: |- +{{ .Files.Get "dashboards/central-pruning-gc.json" | indent 4 }} + central-network-analysis.json: |- +{{ .Files.Get "dashboards/central-network-analysis.json" | indent 4 }} + central-report-generation.json: |- +{{ .Files.Get "dashboards/central-report-generation.json" | indent 4 }} + central-api-ui.json: |- +{{ .Files.Get "dashboards/central-api-ui.json" | indent 4 }} diff --git a/migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go b/migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go index 054ee19f698d3..74db70567fba2 100644 --- a/migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go +++ b/migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go @@ -55,24 +55,7 @@ func migrate(database *types.Databases) error { } // Add the indexes back - resultDB = db.Exec("CREATE INDEX CONCURRENTLY IF NOT EXISTS processindicators_deploymentid ON process_indicators USING HASH (deploymentid)") - if resultDB.Error != nil { - log.Error(errors.Wrap(resultDB.Error, "unable to create index processindicators_deploymentid")) - } - resultDB = db.Exec("CREATE INDEX CONCURRENTLY IF NOT EXISTS processindicators_poduid ON process_indicators USING HASH (poduid)") - if resultDB.Error != nil { - log.Error(errors.Wrap(resultDB.Error, "unable to create index processindicators_poduid")) - } - resultDB = db.Exec("CREATE INDEX CONCURRENTLY IF NOT EXISTS processindicators_signal_time ON process_indicators (signal_time)") - if resultDB.Error != nil { - log.Error(errors.Wrap(resultDB.Error, "unable to create index processindicators_signal_time")) - } - - log.Info("Process Indicators migrated") - return nil -} - -func migrateByCluster(cluster string, database *types.Databases) error { + resultDB = db.Exec("CREATE INDEX CONCURRENTLY IF NOT EXISTS 2222 ctx, cancel := context.WithTimeout(database.DBCtx, types.DefaultMigrationTimeout) defer cancel()