Skip to content

refactor: migrate metrics init() to explicit Init() for OOMKill fixes#19980

Draft
janisz wants to merge 8 commits intomasterfrom
master-init-migration-metrics
Draft

refactor: migrate metrics init() to explicit Init() for OOMKill fixes#19980
janisz wants to merge 8 commits intomasterfrom
master-init-migration-metrics

Conversation

@janisz
Copy link
Copy Markdown
Contributor

@janisz janisz commented Apr 14, 2026

Description

Convert pkg/ and sensor metrics from package-level init() to explicit Init() calls to fix OOMKills in config-controller (128Mi) and admission-control (500Mi).

Changes:

  • pkg/grpc/metrics, pkg/images, pkg/postgres, pkg/rate, pkg/tlscheckcache: init() → Init()
  • sensor/admission-control/manager, sensor/common/centralproxy, sensor/common/metrics: init() → Init()
  • central/metrics: create Init() function consolidating metric registrations
  • central/app: create app.go + init.go calling initMetrics()
  • sensor/*/app: create init.go calling initMetrics()

Refs:

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

change me!

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Consider making the various Init() functions idempotent (or clearly documenting their expected call patterns), since repeated calls will currently panic due to prometheus.MustRegister duplicate registrations.
  • In central/app/init.go, initCompliance() and initGraphQL() are stubs with only comments; it would be helpful to add explicit TODO markers (and optionally a tracking reference) so future work on refactoring away remaining init()-based registration is easier to discover.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider making the various `Init()` functions idempotent (or clearly documenting their expected call patterns), since repeated calls will currently panic due to `prometheus.MustRegister` duplicate registrations.
- In `central/app/init.go`, `initCompliance()` and `initGraphQL()` are stubs with only comments; it would be helpful to add explicit TODO markers (and optionally a tracking reference) so future work on refactoring away remaining `init()`-based registration is easier to discover.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • Refactor
    • Standardized metrics initialization across the system to occur explicitly during application startup rather than implicitly, improving initialization sequence control and consistency across central and sensor components.

Walkthrough

Converts scattered package-level Prometheus init() registrations into explicit Init() entry points and updates application startup routines to call metric initialization (adds app.Run() in central and initMetrics() hooks in sensor apps), changing when metrics are registered during startup.

Changes

Cohort / File(s) Summary
Central app startup
central/app/app.go, central/app/init.go
Adds app.Run() which sets memory limits, starts pre-main, and calls initMetrics() (new helper) to initialize central metrics.
Central metrics
central/metrics/init.go
Replaces package init() with exported Init() that registers central Prometheus collectors; requires explicit invocation.
Packages: shared libs metrics
pkg/grpc/metrics/prom.go, pkg/images/metrics.go, pkg/postgres/metrics.go, pkg/rate/metrics.go, pkg/tlscheckcache/tlscheckcache_metrics.go
Replace package init() hooks with exported Init() functions; pkg/grpc/metrics/prom.go also hardcodes Prometheus Namespace to "rox".
Sensor admission-control startup
sensor/admission-control/app/app.go, sensor/admission-control/app/init.go, sensor/admission-control/manager/metrics.go
Adds initMetrics() call in startup and new init.go that invokes manager metric init; manager metrics init()Init().
Sensor kubernetes startup
sensor/kubernetes/app/app.go, sensor/kubernetes/app/init.go, sensor/kubernetes/complianceoperator/metrics.go
Adds initMetrics() into startup flow; new init.go calls metrics.Init() and listener metrics; compliance operator init() renamed to InitMetrics().
Sensor common metrics orchestration
sensor/common/metrics/init.go, sensor/common/centralproxy/metrics.go
Converts package init() → exported Init() and centralizes component metric initialization by explicitly calling component Init() functions (e.g., clusterentities, detector, networkflow, updatecomputer, pubsub, registry, virtualmachine, etc.).
Sensor common component metrics
sensor/common/clusterentities/metrics/..., sensor/common/detector/metrics/..., sensor/common/networkflow/..., sensor/common/networkflow/updatecomputer/metrics.go, sensor/common/pubsub/metrics/metrics.go, sensor/common/registry/metrics/metrics.go, sensor/common/virtualmachine/metrics/metrics.go, sensor/kubernetes/listener/resources/metrics/metrics.go
Many component packages replace automatic init() registration with exported Init() functions; Prometheus registration logic unchanged but now requires explicit calls.
Test harness
sensor/common/sensor/central_receiver_test.go
Adds TestMain to initialize metrics (metrics.Init()) once before running tests.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(52,152,219,0.5)
    Participant App as App.Run()
    end
    rect rgba(46,204,113,0.5)
    Participant Sys as System libs
    end
    rect rgba(231,76,60,0.5)
    Participant Metrics as Metrics packages
    end

    App->>Sys: memlimit.SetMemoryLimit()
    App->>Sys: premain.StartMain()
    App->>Metrics: initMetrics() / metrics.Init()
    Metrics->>Metrics: prometheus.MustRegister(... collectors ...)
    Metrics-->>App: metrics initialized
    App->>App: continue startup (profiling, services, etc.)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description provides clear context about the refactoring goal, lists all affected packages, references related issues, but key sections like validation and testing rationale contain only placeholder text. Complete the 'How I validated my change' section explaining validation approach and provide testing rationale for why no new automated tests were added.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: migrating metrics initialization from package-level init() to explicit Init() functions to fix OOMKill issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch master-init-migration-metrics

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
sensor/common/centralproxy/metrics.go (1)

87-96: ⚠️ Potential issue | 🟠 Major

This Init() is not in the new Sensor registration chain.

After Line 89, these collectors only register if centralproxy.Init() is called explicitly. The provided sensor/common/metrics/init.go:13-65 bootstrap omits it, so the central-proxy metrics disappear after this refactor. Please add centralproxy.Init() to that Sensor metrics initializer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sensor/common/centralproxy/metrics.go` around lines 87 - 96, The
central-proxy collectors are only registered when centralproxy.Init() runs; add
a call to centralproxy.Init() from the Sensor metrics bootstrap initializer so
those collectors are always registered during metrics setup. Locate the Sensor
metrics initializer (the function that currently emplaces collectors) and invoke
centralproxy.Init() (the Init function in the centralproxy package) as part of
that initialization sequence.
pkg/rate/metrics.go (1)

16-22: ⚠️ Potential issue | 🟠 Major

Register these collectors from the Central metrics bootstrap.

After Line 16, these metrics only exist if rate.Init() is called explicitly. The provided central/app/init.go:1-9 path does not do that, so the rate-limiter collectors disappear after this refactor. Please wire rate.Init() into the same Central metrics initialization flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/rate/metrics.go` around lines 16 - 22, The rate limiter Prometheus
collectors (RequestsTotal, ActiveClients, PerClientRate,
PerClientBucketCapacity) are only registered when pkg/rate.Init() is called;
wire a call to rate.Init() into the Central metrics bootstrap initialization so
these collectors are always registered during central startup. Locate the
Central metrics initialization routine that registers other collectors and add a
call to rate.Init() there (ensuring rate.Init() remains idempotent or guarded to
avoid double-registration), so the rate metrics are present without requiring
explicit calls elsewhere.
🧹 Nitpick comments (3)
pkg/grpc/metrics/prom.go (1)

11-13: Avoid hardcoding the Prometheus namespace here.

Line 11 duplicates pkg/metrics.PrometheusNamespace as a string literal. That creates a second source of truth for metric names and makes a future namespace change easy to miss.

Suggested change
 import (
 	"github.com/prometheus/client_golang/prometheus"
+	pkgMetrics "github.com/stackrox/rox/pkg/metrics"
 )
@@
-			Namespace: "rox",
+			Namespace: pkgMetrics.PrometheusNamespace,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/grpc/metrics/prom.go` around lines 11 - 13, Replace the hardcoded
Namespace string "rox" in the metric definition that sets Name
"endpoint_tls_handshakes_with_negotiated_alp_total" with the shared constant
metrics.PrometheusNamespace (importing the pkg/metrics package or using its
existing alias), so the Namespace field uses metrics.PrometheusNamespace instead
of the literal; ensure imports are updated and any references still compile.
central/app/app.go (1)

10-12: Unused log variable.

The log variable is declared but not used within this file. If it's intended for future use, consider removing it until needed to avoid dead code.

♻️ Proposed fix
-var (
-	log = logging.LoggerForModule()
-)

And remove the logging import if not needed elsewhere:

 import (
-	"github.com/stackrox/rox/pkg/logging"
 	"github.com/stackrox/rox/pkg/memlimit"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@central/app/app.go` around lines 10 - 12, The declared package-level variable
log (created via logging.LoggerForModule()) is unused; remove the var block that
defines log and also remove the import of the logging package if it’s not used
elsewhere in this file; if you intend to keep a placeholder for future logging,
either use the variable or add a clear TODO comment explaining why it remains to
avoid dead code.
central/app/init.go (1)

11-28: Stub functions are documented but could benefit from tracking.

The initCompliance() and initGraphQL() stubs are clearly documented as placeholders for future migration. Consider opening tracking issues or adding TODO markers with issue references to ensure these migrations don't get forgotten.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@central/app/init.go` around lines 11 - 28, The initCompliance() and
initGraphQL() stubs need explicit tracking so the planned refactor isn't
forgotten: add TODO comments inside both functions referencing a created
tracking issue (e.g., "TODO: migrate init()-based registration to explicit
registration — see ISSUE-XXXX") or create the tracking issues first and include
their IDs in the TODOs; ensure the comments mention the functions initCompliance
and initGraphQL and the intended work (refactor compliance/checks and GraphQL
loader registration) so reviewers can find and close the issues later.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/postgres/metrics.go`:
- Around line 10-13: The postgres package's Init() (pkg/postgres/metrics.go
Init) is no longer being called from central's startup, so prometheus metric
queryErrors isn't registered; update the central metrics bootstrap (where
central/metrics.Init() is invoked in central/app/init.go) to also call
pkg/postgres/metrics.Init() (or postgres.Init()) during startup so
postgres_query_errors gets registered at app init. Ensure the call happens
alongside central/metrics.Init() in the same initialization sequence.

In `@sensor/common/metrics/init.go`:
- Around line 62-70: The code calls exported Init() functions (e.g.,
clusterentitiesmetrics.Init(), detectormetrics.Init(),
networkflowmetrics.Init(), updatecomputermetrics.Init(), pubsubmetrics.Init(),
registrymetrics.Init(), virtualmachinemetrics.Init(),
listenerresourcesmetrics.Init()) but those packages only define unexported
init() functions and thus the calls will not compile; fix by either adding
exported Init() wrappers in each metrics package (create an Init() function in
clusterentities/metrics, detector/metrics, networkflow/metrics,
networkflow/updatecomputer/metrics, pubsub/metrics, registry/metrics,
virtualmachine/metrics, listener/resources/metrics that invokes the current
initialization code or delegates to the existing init logic) or by removing
these Init() calls and relying on Go package init() side-effects (import the
packages for side-effects if needed), ensuring the symbol names above match the
new exported functions if you choose the wrapper approach.

---

Outside diff comments:
In `@pkg/rate/metrics.go`:
- Around line 16-22: The rate limiter Prometheus collectors (RequestsTotal,
ActiveClients, PerClientRate, PerClientBucketCapacity) are only registered when
pkg/rate.Init() is called; wire a call to rate.Init() into the Central metrics
bootstrap initialization so these collectors are always registered during
central startup. Locate the Central metrics initialization routine that
registers other collectors and add a call to rate.Init() there (ensuring
rate.Init() remains idempotent or guarded to avoid double-registration), so the
rate metrics are present without requiring explicit calls elsewhere.

In `@sensor/common/centralproxy/metrics.go`:
- Around line 87-96: The central-proxy collectors are only registered when
centralproxy.Init() runs; add a call to centralproxy.Init() from the Sensor
metrics bootstrap initializer so those collectors are always registered during
metrics setup. Locate the Sensor metrics initializer (the function that
currently emplaces collectors) and invoke centralproxy.Init() (the Init function
in the centralproxy package) as part of that initialization sequence.

---

Nitpick comments:
In `@central/app/app.go`:
- Around line 10-12: The declared package-level variable log (created via
logging.LoggerForModule()) is unused; remove the var block that defines log and
also remove the import of the logging package if it’s not used elsewhere in this
file; if you intend to keep a placeholder for future logging, either use the
variable or add a clear TODO comment explaining why it remains to avoid dead
code.

In `@central/app/init.go`:
- Around line 11-28: The initCompliance() and initGraphQL() stubs need explicit
tracking so the planned refactor isn't forgotten: add TODO comments inside both
functions referencing a created tracking issue (e.g., "TODO: migrate
init()-based registration to explicit registration — see ISSUE-XXXX") or create
the tracking issues first and include their IDs in the TODOs; ensure the
comments mention the functions initCompliance and initGraphQL and the intended
work (refactor compliance/checks and GraphQL loader registration) so reviewers
can find and close the issues later.

In `@pkg/grpc/metrics/prom.go`:
- Around line 11-13: Replace the hardcoded Namespace string "rox" in the metric
definition that sets Name "endpoint_tls_handshakes_with_negotiated_alp_total"
with the shared constant metrics.PrometheusNamespace (importing the pkg/metrics
package or using its existing alias), so the Namespace field uses
metrics.PrometheusNamespace instead of the literal; ensure imports are updated
and any references still compile.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: cc207921-ba85-4e8b-bfa1-1e39a45f27fd

📥 Commits

Reviewing files that changed from the base of the PR and between 9ef4d62 and 33de959.

📒 Files selected for processing (15)
  • central/app/app.go
  • central/app/init.go
  • central/metrics/init.go
  • pkg/grpc/metrics/prom.go
  • pkg/images/metrics.go
  • pkg/postgres/metrics.go
  • pkg/rate/metrics.go
  • pkg/tlscheckcache/tlscheckcache_metrics.go
  • sensor/admission-control/app/app.go
  • sensor/admission-control/app/init.go
  • sensor/admission-control/manager/metrics.go
  • sensor/common/centralproxy/metrics.go
  • sensor/common/metrics/init.go
  • sensor/kubernetes/app/app.go
  • sensor/kubernetes/app/init.go

Comment thread pkg/postgres/metrics.go
Comment on lines +10 to 13
func Init() {
prometheus.MustRegister(
queryErrors,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wire this Init() into Central startup.

Line 10 makes registration opt-in, but the provided central/app/init.go:1-9 startup path only reaches central/metrics.Init() and never calls postgres.Init(). That means postgres_query_errors stops being registered after this change. Please add postgres.Init() to the central metrics bootstrap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/postgres/metrics.go` around lines 10 - 13, The postgres package's Init()
(pkg/postgres/metrics.go Init) is no longer being called from central's startup,
so prometheus metric queryErrors isn't registered; update the central metrics
bootstrap (where central/metrics.Init() is invoked in central/app/init.go) to
also call pkg/postgres/metrics.Init() (or postgres.Init()) during startup so
postgres_query_errors gets registered at app init. Ensure the call happens
alongside central/metrics.Init() in the same initialization sequence.

Comment thread sensor/common/metrics/init.go Outdated
Comment on lines +62 to +70
// component-specific metrics
clusterentitiesmetrics.Init()
detectormetrics.Init()
networkflowmetrics.Init()
updatecomputermetrics.Init()
pubsubmetrics.Init()
registrymetrics.Init()
virtualmachinemetrics.Init()
listenerresourcesmetrics.Init()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that Init() functions are exported in the called packages

echo "=== Checking for Init() functions in metric packages ==="

for pkg in \
  "sensor/common/clusterentities/metrics" \
  "sensor/common/detector/metrics" \
  "sensor/common/networkflow/metrics" \
  "sensor/common/networkflow/updatecomputer" \
  "sensor/common/pubsub/metrics" \
  "sensor/common/registry/metrics" \
  "sensor/common/virtualmachine/metrics" \
  "sensor/kubernetes/listener/resources/metrics"; do
  
  echo ""
  echo "--- $pkg ---"
  # Search for exported Init function
  rg -n "^func Init\(\)" "$pkg" 2>/dev/null || echo "No Init() found in $pkg"
done

Repository: stackrox/stackrox

Length of output: 1784


🏁 Script executed:

# List available Go files in the metric packages to understand structure
find . -path "*/metrics/*.go" -type f 2>/dev/null | head -20

Repository: stackrox/stackrox

Length of output: 990


🏁 Script executed:

# Find metrics files in sensor directory
find ./sensor -path "*/metrics/*.go" -type f 2>/dev/null | grep -E "(clusterentities|detector|networkflow|pubsub|registry|virtualmachine|listener)" | head -20

Repository: stackrox/stackrox

Length of output: 487


🏁 Script executed:

# Search for Init() functions in the metric packages
echo "=== Checking for Init() functions ==="
rg -n "^func Init\(\)" ./sensor/common/clusterentities/metrics/ ./sensor/common/detector/metrics/ ./sensor/common/networkflow/metrics/ ./sensor/common/pubsub/metrics/ ./sensor/common/registry/metrics/ ./sensor/common/virtualmachine/metrics/ ./sensor/kubernetes/listener/resources/metrics/

# Also check for init() functions
echo ""
echo "=== Checking for init() functions ==="
rg -n "^func init\(\)" ./sensor/common/clusterentities/metrics/ ./sensor/common/detector/metrics/ ./sensor/common/networkflow/metrics/ ./sensor/common/pubsub/metrics/ ./sensor/common/registry/metrics/ ./sensor/common/virtualmachine/metrics/ ./sensor/kubernetes/listener/resources/metrics/

# Also need to check updatecomputer package
echo ""
echo "=== Checking updatecomputer package ==="
find ./sensor/common/networkflow/updatecomputer -name "*.go" -type f 2>/dev/null | head -5

Repository: stackrox/stackrox

Length of output: 913


🏁 Script executed:

# Check updatecomputer/metrics.go for Init() or init()
echo "=== Checking updatecomputer/metrics.go ==="
rg -n "^func (Init|init)\(\)" ./sensor/common/networkflow/updatecomputer/metrics.go

Repository: stackrox/stackrox

Length of output: 119


Critical: Code will fail to compile — these metric packages do not export Init() functions.

All metric packages contain only init() (unexported) functions, not Init() (exported):

  • clusterentities/metrics/metrics.go: line 84
  • detector/metrics/metrics.go: line 351
  • networkflow/metrics/metrics.go: line 8
  • pubsub/metrics/metrics.go: line 76
  • registry/metrics/metrics.go: line 8
  • virtualmachine/metrics/metrics.go: line 129
  • listener/resources/metrics/metrics.go: line 25
  • networkflow/updatecomputer/metrics.go: line 8

Calling .Init() on these packages will fail with undefined function errors. Either export Init() functions in these packages or use a different initialization pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sensor/common/metrics/init.go` around lines 62 - 70, The code calls exported
Init() functions (e.g., clusterentitiesmetrics.Init(), detectormetrics.Init(),
networkflowmetrics.Init(), updatecomputermetrics.Init(), pubsubmetrics.Init(),
registrymetrics.Init(), virtualmachinemetrics.Init(),
listenerresourcesmetrics.Init()) but those packages only define unexported
init() functions and thus the calls will not compile; fix by either adding
exported Init() wrappers in each metrics package (create an Init() function in
clusterentities/metrics, detector/metrics, networkflow/metrics,
networkflow/updatecomputer/metrics, pubsub/metrics, registry/metrics,
virtualmachine/metrics, listener/resources/metrics that invokes the current
initialization code or delegates to the existing init logic) or by removing
these Init() calls and relying on Go package init() side-effects (import the
packages for side-effects if needed), ensuring the symbol names above match the
new exported functions if you choose the wrapper approach.

@janisz janisz force-pushed the master-init-migration-metrics branch 3 times, most recently from c276ddc to 8b137ad Compare April 14, 2026 14:10
@janisz janisz requested a review from a team as a code owner April 14, 2026 14:10
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 14, 2026

🚀 Build Images Ready

Images are ready for commit d83e225. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-667-gd83e22587d

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.57%. Comparing base (4881eec) to head (d83e225).

Files with missing lines Patch % Lines
central/metrics/init.go 0.00% 15 Missing ⚠️
pkg/grpc/metrics/prom.go 0.00% 1 Missing ⚠️
pkg/rate/metrics.go 0.00% 1 Missing ⚠️
pkg/tlscheckcache/tlscheckcache_metrics.go 0.00% 1 Missing ⚠️
sensor/admission-control/manager/metrics.go 0.00% 1 Missing ⚠️
sensor/common/centralproxy/metrics.go 0.00% 1 Missing ⚠️
...ensor/common/networkflow/updatecomputer/metrics.go 0.00% 1 Missing ⚠️
sensor/kubernetes/complianceoperator/metrics.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19980      +/-   ##
==========================================
- Coverage   49.61%   49.57%   -0.05%     
==========================================
  Files        2765     2765              
  Lines      208643   208657      +14     
==========================================
- Hits       103520   103435      -85     
- Misses      97467    97565      +98     
- Partials     7656     7657       +1     
Flag Coverage Δ
go-unit-tests 49.57% <0.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread pkg/grpc/metrics/prom.go Outdated
// Leaving subsystem empty to skip duplicating this metric for each component
prometheus.CounterOpts{
Namespace: pkgMetrics.PrometheusNamespace,
Namespace: "rox",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this remain a constant some place?

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 14, 2026

@janisz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-qa-e2e-tests 786fd1d link false /test gke-qa-e2e-tests
ci/prow/ocp-4-21-qa-e2e-tests 786fd1d link false /test ocp-4-21-qa-e2e-tests
ci/prow/ocp-4-12-qa-e2e-tests 786fd1d link false /test ocp-4-12-qa-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

janisz and others added 4 commits April 15, 2026 18:14
Convert pkg/ and sensor metrics from package-level init() to explicit Init()
calls to fix OOMKills in config-controller (128Mi) and admission-control (500Mi).

Changes:
- pkg/grpc/metrics, pkg/images, pkg/postgres, pkg/rate, pkg/tlscheckcache: init() → Init()
- sensor/admission-control/manager, sensor/common/centralproxy, sensor/common/metrics: init() → Init()
- central/metrics: create Init() function consolidating metric registrations
- central/app: create app.go + init.go calling initMetrics()
- sensor/*/app: create init.go calling initMetrics()

With busybox binary consolidation, all init() functions run for all components.
This change makes metrics initialization conditional - only central registers
central metrics, only sensor registers sensor metrics.

Expected memory impact:
- config-controller: 150MB → < 100MB
- admission-control: 600MB → < 400MB
- OOMKills: 7 + 6-7 per replica → 0

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
roxvet error: sensor/common cannot import from sensor/kubernetes/listener/resources/metrics

Moved listenerresourcesmetrics.Init() from sensor/common/metrics to sensor/kubernetes/app/init.go
where it belongs (kubernetes-specific, not common).
Problem: golangci-lint reported unused log variable in central/app/app.go.
Root cause: metrics PR accidentally included initCompliance() and initGraphQL()
calls that belong in separate PRs.

Solution: simplify central/app to only include metrics initialization, removing
cross-PR contamination and the unused log variable.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem: Test_CentralReceiverSuite tests failing because metrics were not
registered. Tests expected componentProcessMessageErrorsCount metric to exist
but it showed 0 instead of 5.

Root cause: After converting init() → Init(), the tests don't automatically
register metrics anymore. Tests used to pass because init() ran automatically.

Solution: Add TestMain() that calls metrics.Init() before running tests.

Fixes:
- Test_CentralReceiverSuite/Test_ComponentProcessMessageErrorsMetric
- Test_CentralReceiverSuite/Test_SlowComponentDropMessages

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
sensor/admission-control/manager/metrics.go (1)

80-89: ⚠️ Potential issue | 🟠 Major

Make Init() idempotent to prevent duplicate registration panics.

Moving from a package init() to an explicit Init() function shifts responsibility for single-invocation guarantees from the Go runtime to the caller. Since prometheus.MustRegister() panics on duplicate registration, this should be protected with sync.Once to prevent crashes if Init() is called more than once—whether in tests or future call paths.

Suggested fix
 import (
+	"sync"
 	"time"
@@
 )
 
+var metricsRegisterOnce sync.Once
+
 // Init registers all admission-control prometheus metrics.
 // Called explicitly from sensor/admission-control/app/app.go instead of package init().
 func Init() {
-	prometheus.MustRegister(
-		imageCacheOperations,
-		imageFetchTotal,
-		imageFetchDuration,
-		imageFetchesPerReview,
-		policyevalReviewDuration,
-		policyevalReviewTotal,
-	)
+	metricsRegisterOnce.Do(func() {
+		prometheus.MustRegister(
+			imageCacheOperations,
+			imageFetchTotal,
+			imageFetchDuration,
+			imageFetchesPerReview,
+			policyevalReviewDuration,
+			policyevalReviewTotal,
+		)
+	})
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sensor/admission-control/manager/metrics.go` around lines 80 - 89, Init()
currently calls prometheus.MustRegister(...) directly which will panic if called
multiple times; make Init idempotent by adding a package-level sync.Once (e.g.,
initOnce) and move the prometheus.MustRegister(...) call inside
initOnce.Do(func() { ... }), so subsequent Init() calls become no-ops; reference
the Init function and the prometheus.MustRegister invocation and use the
sync.Once to guard registration of imageCacheOperations, imageFetchTotal,
imageFetchDuration, imageFetchesPerReview, policyevalReviewDuration, and
policyevalReviewTotal.
sensor/common/clusterentities/metrics/metrics.go (1)

84-90: ⚠️ Potential issue | 🟠 Major

Make metric Init() idempotent to prevent duplicate-registration panics.

prometheus.MustRegister panics if the same collector is registered twice. While the function is currently called only once during initialization, it's now exported and callable multiple times, creating risk for future misuse. Protect against this with a sync.Once guard:

Suggested fix
 import (
+	"sync"
 	"time"

 	"github.com/prometheus/client_golang/prometheus"
 	"github.com/stackrox/rox/pkg/metrics"
 )
 
 var (
+	registerOnce sync.Once
+
 	containersStored = prometheus.NewGaugeVec(prometheus.GaugeOpts{
 		Namespace: metrics.PrometheusNamespace,
 		Subsystem: metrics.SensorSubsystem.String(),
@@
 // Init registers cluster entities metrics.
 func Init() {
-	prometheus.MustRegister(containersStored)
-	prometheus.MustRegister(ipsStored)
-	prometheus.MustRegister(endpointsStored)
-	prometheus.MustRegister(ipsHavingMultipleContainers)
-	prometheus.MustRegister(storeLockHeldDurationSeconds)
+	registerOnce.Do(func() {
+		prometheus.MustRegister(containersStored)
+		prometheus.MustRegister(ipsStored)
+		prometheus.MustRegister(endpointsStored)
+		prometheus.MustRegister(ipsHavingMultipleContainers)
+		prometheus.MustRegister(storeLockHeldDurationSeconds)
+	})
 }

Apply the same pattern to other component metrics modules: sensor/common/pubsub/metrics/metrics.go, sensor/common/detector/metrics/metrics.go, sensor/common/networkflow/metrics/metrics.go, sensor/common/registry/metrics/metrics.go, and sensor/common/networkflow/updatecomputer/metrics.go.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sensor/common/clusterentities/metrics/metrics.go` around lines 84 - 90, The
Init() function currently calls prometheus.MustRegister(...) directly which will
panic if called more than once; make Init() idempotent by adding a package-level
sync.Once (e.g., var initOnce sync.Once) and wrap the registration block in
initOnce.Do(func(){ ... }) so containersStored, ipsStored, endpointsStored,
ipsHavingMultipleContainers, and storeLockHeldDurationSeconds are registered
only once; apply the same sync.Once pattern to the Init functions in
sensor/common/pubsub/metrics/metrics.go,
sensor/common/detector/metrics/metrics.go,
sensor/common/networkflow/metrics/metrics.go,
sensor/common/registry/metrics/metrics.go, and
sensor/common/networkflow/updatecomputer/metrics.go.
🧹 Nitpick comments (2)
sensor/kubernetes/listener/resources/metrics/metrics.go (1)

25-27: Consider guarding Init() with sync.Once to prevent potential duplicate-registration panics.

While listenerresourcesmetrics.Init() is currently called from a single site (sensor/kubernetes/app/init.go), the exported function could be invoked multiple times if called directly in tests or future initialization paths. Since prometheus.MustRegister(...) panics on duplicate registration, protecting this with sync.Once is a defensive improvement.

Suggested fix
 package metrics
 
 import (
+	"sync"
+
 	"github.com/prometheus/client_golang/prometheus"
 	"github.com/stackrox/rox/pkg/metrics"
 )
@@
 var (
+	registerOnce sync.Once
+
 	podsStored = prometheus.NewGaugeVec(prometheus.GaugeOpts{
@@
 // Init registers listener resource metrics.
 func Init() {
-	prometheus.MustRegister(podsStored)
+	registerOnce.Do(func() {
+		prometheus.MustRegister(podsStored)
+	})
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sensor/kubernetes/listener/resources/metrics/metrics.go` around lines 25 -
27, The Init function currently calls prometheus.MustRegister(podsStored)
unguarded and can panic on duplicate registration; wrap the registration in a
sync.Once to ensure podsStored is only registered once. Add a package-level
sync.Once variable (e.g., var initOnce sync.Once) and change Init() to call
initOnce.Do(func() { prometheus.MustRegister(podsStored) }) so repeated Init()
calls (from tests or other init paths) are safe.
sensor/kubernetes/complianceoperator/metrics.go (1)

9-14: Guard InitMetrics() registration with sync.Once to ensure idempotency.

This function is now exported and no longer an implicit init() block, making it callable from external code. While prometheus.MustRegister() panics on duplicate registration, this is best handled defensively by wrapping the call with sync.Once to guarantee single registration regardless of how many times the function is invoked.

Suggested pattern
+import "sync"
+
+var registerMetricsOnce sync.Once
+
 // InitMetrics registers compliance operator metrics.
 func InitMetrics() {
-	prometheus.MustRegister(
-		commandsFromCentral,
-		applyScanConfigCommands,
-	)
+	registerMetricsOnce.Do(func() {
+		prometheus.MustRegister(
+			commandsFromCentral,
+			applyScanConfigCommands,
+		)
+	})
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sensor/kubernetes/complianceoperator/metrics.go` around lines 9 - 14,
InitMetrics currently calls prometheus.MustRegister(...) directly which can
panic if called more than once; make registration idempotent by adding a
package-level sync.Once (e.g., var metricsOnce sync.Once) and wrap the
prometheus.MustRegister(commandsFromCentral, applyScanConfigCommands) call
inside metricsOnce.Do(func(){ ... }); update InitMetrics to invoke that Do so
repeated calls are safe and duplicate registration is avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@sensor/admission-control/manager/metrics.go`:
- Around line 80-89: Init() currently calls prometheus.MustRegister(...)
directly which will panic if called multiple times; make Init idempotent by
adding a package-level sync.Once (e.g., initOnce) and move the
prometheus.MustRegister(...) call inside initOnce.Do(func() { ... }), so
subsequent Init() calls become no-ops; reference the Init function and the
prometheus.MustRegister invocation and use the sync.Once to guard registration
of imageCacheOperations, imageFetchTotal, imageFetchDuration,
imageFetchesPerReview, policyevalReviewDuration, and policyevalReviewTotal.

In `@sensor/common/clusterentities/metrics/metrics.go`:
- Around line 84-90: The Init() function currently calls
prometheus.MustRegister(...) directly which will panic if called more than once;
make Init() idempotent by adding a package-level sync.Once (e.g., var initOnce
sync.Once) and wrap the registration block in initOnce.Do(func(){ ... }) so
containersStored, ipsStored, endpointsStored, ipsHavingMultipleContainers, and
storeLockHeldDurationSeconds are registered only once; apply the same sync.Once
pattern to the Init functions in sensor/common/pubsub/metrics/metrics.go,
sensor/common/detector/metrics/metrics.go,
sensor/common/networkflow/metrics/metrics.go,
sensor/common/registry/metrics/metrics.go, and
sensor/common/networkflow/updatecomputer/metrics.go.

---

Nitpick comments:
In `@sensor/kubernetes/complianceoperator/metrics.go`:
- Around line 9-14: InitMetrics currently calls prometheus.MustRegister(...)
directly which can panic if called more than once; make registration idempotent
by adding a package-level sync.Once (e.g., var metricsOnce sync.Once) and wrap
the prometheus.MustRegister(commandsFromCentral, applyScanConfigCommands) call
inside metricsOnce.Do(func(){ ... }); update InitMetrics to invoke that Do so
repeated calls are safe and duplicate registration is avoided.

In `@sensor/kubernetes/listener/resources/metrics/metrics.go`:
- Around line 25-27: The Init function currently calls
prometheus.MustRegister(podsStored) unguarded and can panic on duplicate
registration; wrap the registration in a sync.Once to ensure podsStored is only
registered once. Add a package-level sync.Once variable (e.g., var initOnce
sync.Once) and change Init() to call initOnce.Do(func() {
prometheus.MustRegister(podsStored) }) so repeated Init() calls (from tests or
other init paths) are safe.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: f6fdbf1f-1ddd-487a-b54c-d6def9917b01

📥 Commits

Reviewing files that changed from the base of the PR and between 33de959 and 786fd1d.

📒 Files selected for processing (25)
  • central/app/app.go
  • central/app/init.go
  • central/metrics/init.go
  • pkg/grpc/metrics/prom.go
  • pkg/images/metrics.go
  • pkg/postgres/metrics.go
  • pkg/rate/metrics.go
  • pkg/tlscheckcache/tlscheckcache_metrics.go
  • sensor/admission-control/app/app.go
  • sensor/admission-control/app/init.go
  • sensor/admission-control/manager/metrics.go
  • sensor/common/centralproxy/metrics.go
  • sensor/common/clusterentities/metrics/metrics.go
  • sensor/common/detector/metrics/metrics.go
  • sensor/common/metrics/init.go
  • sensor/common/networkflow/metrics/metrics.go
  • sensor/common/networkflow/updatecomputer/metrics.go
  • sensor/common/pubsub/metrics/metrics.go
  • sensor/common/registry/metrics/metrics.go
  • sensor/common/sensor/central_receiver_test.go
  • sensor/common/virtualmachine/metrics/metrics.go
  • sensor/kubernetes/app/app.go
  • sensor/kubernetes/app/init.go
  • sensor/kubernetes/complianceoperator/metrics.go
  • sensor/kubernetes/listener/resources/metrics/metrics.go
✅ Files skipped from review due to trivial changes (2)
  • sensor/kubernetes/app/app.go
  • sensor/admission-control/app/init.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • pkg/postgres/metrics.go
  • sensor/kubernetes/app/init.go
  • pkg/images/metrics.go
  • pkg/rate/metrics.go
  • pkg/grpc/metrics/prom.go
  • sensor/common/centralproxy/metrics.go
  • pkg/tlscheckcache/tlscheckcache_metrics.go
  • central/app/init.go
  • sensor/admission-control/app/app.go
  • central/app/app.go

User request: "rebase on master and address code review comments"

Changes:
- Add postgres.Init() and rate.Init() to central metrics bootstrap
- Add centralproxy.Init() to sensor metrics bootstrap
- Replace hardcoded "rox" namespace with metrics.PrometheusNamespace constant
- Add manager.Init() to admission-control startup (post-rebase structure)
- Inline central/app/init.go into app.go per user feedback

Why: Ensures all metrics packages are properly initialized at startup,
preventing missing prometheus collectors. Uses shared constants to
avoid duplication. Simplifies code structure by inlining single-use
functions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
janisz and others added 2 commits April 15, 2026 18:22
User request: "who calls sensor/kubernetes/complianceoperator/metrics.go InitMetrics?"

Why: After rebase, sensor/kubernetes/app/ was removed in favor of main.go,
but the orphaned app/init.go was never called. This meant:
- complianceoperator.InitMetrics() was never invoked
- sensor/common/metrics.Init() was never invoked
- listener/resources/metrics.Init() was never invoked

Fix: Inline all metrics initialization directly in main(), remove
orphaned app/ directory.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…rpc/metrics}

User request: "check if all initialisation functions are called"

Missing Init() calls found:
- pkg/images.Init() - used by central (image scan) and sensor (delegated scan)
- pkg/tlscheckcache.Init() - used by central (TLS check cache)
- pkg/grpc/metrics.Init() - used by both (ALPN handshake metrics)

Why: These packages expose prometheus metrics but Init() was never called
at startup, so collectors were never registered. Metrics would fail to
record data.

Added to:
- central/app/app.go: images, tlscheckcache, grpc/metrics
- sensor/kubernetes/main.go: images, grpc/metrics
- sensor/admission-control/main.go: grpc/metrics

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@janisz janisz force-pushed the master-init-migration-metrics branch from 786fd1d to 112d71b Compare April 15, 2026 16:30
…stency

User request: "can we rename sensor/kubernetes/complianceoperator/metrics.go InitMetrics to Init?"

Why: All other metrics packages use Init(), this was the only one using
InitMetrics(). Consistency makes the codebase easier to navigate.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants