refactor: migrate metrics init() to explicit Init() for OOMKill fixes#19980
refactor: migrate metrics init() to explicit Init() for OOMKill fixes#19980
Conversation
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Consider making the various
Init()functions idempotent (or clearly documenting their expected call patterns), since repeated calls will currently panic due toprometheus.MustRegisterduplicate registrations. - In
central/app/init.go,initCompliance()andinitGraphQL()are stubs with only comments; it would be helpful to add explicit TODO markers (and optionally a tracking reference) so future work on refactoring away remaininginit()-based registration is easier to discover.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider making the various `Init()` functions idempotent (or clearly documenting their expected call patterns), since repeated calls will currently panic due to `prometheus.MustRegister` duplicate registrations.
- In `central/app/init.go`, `initCompliance()` and `initGraphQL()` are stubs with only comments; it would be helpful to add explicit TODO markers (and optionally a tracking reference) so future work on refactoring away remaining `init()`-based registration is easier to discover.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughConverts scattered package-level Prometheus Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(52,152,219,0.5)
Participant App as App.Run()
end
rect rgba(46,204,113,0.5)
Participant Sys as System libs
end
rect rgba(231,76,60,0.5)
Participant Metrics as Metrics packages
end
App->>Sys: memlimit.SetMemoryLimit()
App->>Sys: premain.StartMain()
App->>Metrics: initMetrics() / metrics.Init()
Metrics->>Metrics: prometheus.MustRegister(... collectors ...)
Metrics-->>App: metrics initialized
App->>App: continue startup (profiling, services, etc.)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
sensor/common/centralproxy/metrics.go (1)
87-96:⚠️ Potential issue | 🟠 MajorThis
Init()is not in the new Sensor registration chain.After Line 89, these collectors only register if
centralproxy.Init()is called explicitly. The providedsensor/common/metrics/init.go:13-65bootstrap omits it, so the central-proxy metrics disappear after this refactor. Please addcentralproxy.Init()to that Sensor metrics initializer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sensor/common/centralproxy/metrics.go` around lines 87 - 96, The central-proxy collectors are only registered when centralproxy.Init() runs; add a call to centralproxy.Init() from the Sensor metrics bootstrap initializer so those collectors are always registered during metrics setup. Locate the Sensor metrics initializer (the function that currently emplaces collectors) and invoke centralproxy.Init() (the Init function in the centralproxy package) as part of that initialization sequence.pkg/rate/metrics.go (1)
16-22:⚠️ Potential issue | 🟠 MajorRegister these collectors from the Central metrics bootstrap.
After Line 16, these metrics only exist if
rate.Init()is called explicitly. The providedcentral/app/init.go:1-9path does not do that, so the rate-limiter collectors disappear after this refactor. Please wirerate.Init()into the same Central metrics initialization flow.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/rate/metrics.go` around lines 16 - 22, The rate limiter Prometheus collectors (RequestsTotal, ActiveClients, PerClientRate, PerClientBucketCapacity) are only registered when pkg/rate.Init() is called; wire a call to rate.Init() into the Central metrics bootstrap initialization so these collectors are always registered during central startup. Locate the Central metrics initialization routine that registers other collectors and add a call to rate.Init() there (ensuring rate.Init() remains idempotent or guarded to avoid double-registration), so the rate metrics are present without requiring explicit calls elsewhere.
🧹 Nitpick comments (3)
pkg/grpc/metrics/prom.go (1)
11-13: Avoid hardcoding the Prometheus namespace here.Line 11 duplicates
pkg/metrics.PrometheusNamespaceas a string literal. That creates a second source of truth for metric names and makes a future namespace change easy to miss.Suggested change
import ( "github.com/prometheus/client_golang/prometheus" + pkgMetrics "github.com/stackrox/rox/pkg/metrics" ) @@ - Namespace: "rox", + Namespace: pkgMetrics.PrometheusNamespace,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/grpc/metrics/prom.go` around lines 11 - 13, Replace the hardcoded Namespace string "rox" in the metric definition that sets Name "endpoint_tls_handshakes_with_negotiated_alp_total" with the shared constant metrics.PrometheusNamespace (importing the pkg/metrics package or using its existing alias), so the Namespace field uses metrics.PrometheusNamespace instead of the literal; ensure imports are updated and any references still compile.central/app/app.go (1)
10-12: Unused log variable.The
logvariable is declared but not used within this file. If it's intended for future use, consider removing it until needed to avoid dead code.♻️ Proposed fix
-var ( - log = logging.LoggerForModule() -)And remove the logging import if not needed elsewhere:
import ( - "github.com/stackrox/rox/pkg/logging" "github.com/stackrox/rox/pkg/memlimit"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@central/app/app.go` around lines 10 - 12, The declared package-level variable log (created via logging.LoggerForModule()) is unused; remove the var block that defines log and also remove the import of the logging package if it’s not used elsewhere in this file; if you intend to keep a placeholder for future logging, either use the variable or add a clear TODO comment explaining why it remains to avoid dead code.central/app/init.go (1)
11-28: Stub functions are documented but could benefit from tracking.The
initCompliance()andinitGraphQL()stubs are clearly documented as placeholders for future migration. Consider opening tracking issues or adding TODO markers with issue references to ensure these migrations don't get forgotten.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@central/app/init.go` around lines 11 - 28, The initCompliance() and initGraphQL() stubs need explicit tracking so the planned refactor isn't forgotten: add TODO comments inside both functions referencing a created tracking issue (e.g., "TODO: migrate init()-based registration to explicit registration — see ISSUE-XXXX") or create the tracking issues first and include their IDs in the TODOs; ensure the comments mention the functions initCompliance and initGraphQL and the intended work (refactor compliance/checks and GraphQL loader registration) so reviewers can find and close the issues later.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/postgres/metrics.go`:
- Around line 10-13: The postgres package's Init() (pkg/postgres/metrics.go
Init) is no longer being called from central's startup, so prometheus metric
queryErrors isn't registered; update the central metrics bootstrap (where
central/metrics.Init() is invoked in central/app/init.go) to also call
pkg/postgres/metrics.Init() (or postgres.Init()) during startup so
postgres_query_errors gets registered at app init. Ensure the call happens
alongside central/metrics.Init() in the same initialization sequence.
In `@sensor/common/metrics/init.go`:
- Around line 62-70: The code calls exported Init() functions (e.g.,
clusterentitiesmetrics.Init(), detectormetrics.Init(),
networkflowmetrics.Init(), updatecomputermetrics.Init(), pubsubmetrics.Init(),
registrymetrics.Init(), virtualmachinemetrics.Init(),
listenerresourcesmetrics.Init()) but those packages only define unexported
init() functions and thus the calls will not compile; fix by either adding
exported Init() wrappers in each metrics package (create an Init() function in
clusterentities/metrics, detector/metrics, networkflow/metrics,
networkflow/updatecomputer/metrics, pubsub/metrics, registry/metrics,
virtualmachine/metrics, listener/resources/metrics that invokes the current
initialization code or delegates to the existing init logic) or by removing
these Init() calls and relying on Go package init() side-effects (import the
packages for side-effects if needed), ensuring the symbol names above match the
new exported functions if you choose the wrapper approach.
---
Outside diff comments:
In `@pkg/rate/metrics.go`:
- Around line 16-22: The rate limiter Prometheus collectors (RequestsTotal,
ActiveClients, PerClientRate, PerClientBucketCapacity) are only registered when
pkg/rate.Init() is called; wire a call to rate.Init() into the Central metrics
bootstrap initialization so these collectors are always registered during
central startup. Locate the Central metrics initialization routine that
registers other collectors and add a call to rate.Init() there (ensuring
rate.Init() remains idempotent or guarded to avoid double-registration), so the
rate metrics are present without requiring explicit calls elsewhere.
In `@sensor/common/centralproxy/metrics.go`:
- Around line 87-96: The central-proxy collectors are only registered when
centralproxy.Init() runs; add a call to centralproxy.Init() from the Sensor
metrics bootstrap initializer so those collectors are always registered during
metrics setup. Locate the Sensor metrics initializer (the function that
currently emplaces collectors) and invoke centralproxy.Init() (the Init function
in the centralproxy package) as part of that initialization sequence.
---
Nitpick comments:
In `@central/app/app.go`:
- Around line 10-12: The declared package-level variable log (created via
logging.LoggerForModule()) is unused; remove the var block that defines log and
also remove the import of the logging package if it’s not used elsewhere in this
file; if you intend to keep a placeholder for future logging, either use the
variable or add a clear TODO comment explaining why it remains to avoid dead
code.
In `@central/app/init.go`:
- Around line 11-28: The initCompliance() and initGraphQL() stubs need explicit
tracking so the planned refactor isn't forgotten: add TODO comments inside both
functions referencing a created tracking issue (e.g., "TODO: migrate
init()-based registration to explicit registration — see ISSUE-XXXX") or create
the tracking issues first and include their IDs in the TODOs; ensure the
comments mention the functions initCompliance and initGraphQL and the intended
work (refactor compliance/checks and GraphQL loader registration) so reviewers
can find and close the issues later.
In `@pkg/grpc/metrics/prom.go`:
- Around line 11-13: Replace the hardcoded Namespace string "rox" in the metric
definition that sets Name "endpoint_tls_handshakes_with_negotiated_alp_total"
with the shared constant metrics.PrometheusNamespace (importing the pkg/metrics
package or using its existing alias), so the Namespace field uses
metrics.PrometheusNamespace instead of the literal; ensure imports are updated
and any references still compile.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: cc207921-ba85-4e8b-bfa1-1e39a45f27fd
📒 Files selected for processing (15)
central/app/app.gocentral/app/init.gocentral/metrics/init.gopkg/grpc/metrics/prom.gopkg/images/metrics.gopkg/postgres/metrics.gopkg/rate/metrics.gopkg/tlscheckcache/tlscheckcache_metrics.gosensor/admission-control/app/app.gosensor/admission-control/app/init.gosensor/admission-control/manager/metrics.gosensor/common/centralproxy/metrics.gosensor/common/metrics/init.gosensor/kubernetes/app/app.gosensor/kubernetes/app/init.go
| func Init() { | ||
| prometheus.MustRegister( | ||
| queryErrors, | ||
| ) |
There was a problem hiding this comment.
Wire this Init() into Central startup.
Line 10 makes registration opt-in, but the provided central/app/init.go:1-9 startup path only reaches central/metrics.Init() and never calls postgres.Init(). That means postgres_query_errors stops being registered after this change. Please add postgres.Init() to the central metrics bootstrap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/postgres/metrics.go` around lines 10 - 13, The postgres package's Init()
(pkg/postgres/metrics.go Init) is no longer being called from central's startup,
so prometheus metric queryErrors isn't registered; update the central metrics
bootstrap (where central/metrics.Init() is invoked in central/app/init.go) to
also call pkg/postgres/metrics.Init() (or postgres.Init()) during startup so
postgres_query_errors gets registered at app init. Ensure the call happens
alongside central/metrics.Init() in the same initialization sequence.
| // component-specific metrics | ||
| clusterentitiesmetrics.Init() | ||
| detectormetrics.Init() | ||
| networkflowmetrics.Init() | ||
| updatecomputermetrics.Init() | ||
| pubsubmetrics.Init() | ||
| registrymetrics.Init() | ||
| virtualmachinemetrics.Init() | ||
| listenerresourcesmetrics.Init() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify that Init() functions are exported in the called packages
echo "=== Checking for Init() functions in metric packages ==="
for pkg in \
"sensor/common/clusterentities/metrics" \
"sensor/common/detector/metrics" \
"sensor/common/networkflow/metrics" \
"sensor/common/networkflow/updatecomputer" \
"sensor/common/pubsub/metrics" \
"sensor/common/registry/metrics" \
"sensor/common/virtualmachine/metrics" \
"sensor/kubernetes/listener/resources/metrics"; do
echo ""
echo "--- $pkg ---"
# Search for exported Init function
rg -n "^func Init\(\)" "$pkg" 2>/dev/null || echo "No Init() found in $pkg"
doneRepository: stackrox/stackrox
Length of output: 1784
🏁 Script executed:
# List available Go files in the metric packages to understand structure
find . -path "*/metrics/*.go" -type f 2>/dev/null | head -20Repository: stackrox/stackrox
Length of output: 990
🏁 Script executed:
# Find metrics files in sensor directory
find ./sensor -path "*/metrics/*.go" -type f 2>/dev/null | grep -E "(clusterentities|detector|networkflow|pubsub|registry|virtualmachine|listener)" | head -20Repository: stackrox/stackrox
Length of output: 487
🏁 Script executed:
# Search for Init() functions in the metric packages
echo "=== Checking for Init() functions ==="
rg -n "^func Init\(\)" ./sensor/common/clusterentities/metrics/ ./sensor/common/detector/metrics/ ./sensor/common/networkflow/metrics/ ./sensor/common/pubsub/metrics/ ./sensor/common/registry/metrics/ ./sensor/common/virtualmachine/metrics/ ./sensor/kubernetes/listener/resources/metrics/
# Also check for init() functions
echo ""
echo "=== Checking for init() functions ==="
rg -n "^func init\(\)" ./sensor/common/clusterentities/metrics/ ./sensor/common/detector/metrics/ ./sensor/common/networkflow/metrics/ ./sensor/common/pubsub/metrics/ ./sensor/common/registry/metrics/ ./sensor/common/virtualmachine/metrics/ ./sensor/kubernetes/listener/resources/metrics/
# Also need to check updatecomputer package
echo ""
echo "=== Checking updatecomputer package ==="
find ./sensor/common/networkflow/updatecomputer -name "*.go" -type f 2>/dev/null | head -5Repository: stackrox/stackrox
Length of output: 913
🏁 Script executed:
# Check updatecomputer/metrics.go for Init() or init()
echo "=== Checking updatecomputer/metrics.go ==="
rg -n "^func (Init|init)\(\)" ./sensor/common/networkflow/updatecomputer/metrics.goRepository: stackrox/stackrox
Length of output: 119
Critical: Code will fail to compile — these metric packages do not export Init() functions.
All metric packages contain only init() (unexported) functions, not Init() (exported):
clusterentities/metrics/metrics.go: line 84detector/metrics/metrics.go: line 351networkflow/metrics/metrics.go: line 8pubsub/metrics/metrics.go: line 76registry/metrics/metrics.go: line 8virtualmachine/metrics/metrics.go: line 129listener/resources/metrics/metrics.go: line 25networkflow/updatecomputer/metrics.go: line 8
Calling .Init() on these packages will fail with undefined function errors. Either export Init() functions in these packages or use a different initialization pattern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sensor/common/metrics/init.go` around lines 62 - 70, The code calls exported
Init() functions (e.g., clusterentitiesmetrics.Init(), detectormetrics.Init(),
networkflowmetrics.Init(), updatecomputermetrics.Init(), pubsubmetrics.Init(),
registrymetrics.Init(), virtualmachinemetrics.Init(),
listenerresourcesmetrics.Init()) but those packages only define unexported
init() functions and thus the calls will not compile; fix by either adding
exported Init() wrappers in each metrics package (create an Init() function in
clusterentities/metrics, detector/metrics, networkflow/metrics,
networkflow/updatecomputer/metrics, pubsub/metrics, registry/metrics,
virtualmachine/metrics, listener/resources/metrics that invokes the current
initialization code or delegates to the existing init logic) or by removing
these Init() calls and relying on Go package init() side-effects (import the
packages for side-effects if needed), ensuring the symbol names above match the
new exported functions if you choose the wrapper approach.
c276ddc to
8b137ad
Compare
🚀 Build Images ReadyImages are ready for commit d83e225. To use with deploy scripts: export MAIN_IMAGE_TAG=4.11.x-667-gd83e22587d |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #19980 +/- ##
==========================================
- Coverage 49.61% 49.57% -0.05%
==========================================
Files 2765 2765
Lines 208643 208657 +14
==========================================
- Hits 103520 103435 -85
- Misses 97467 97565 +98
- Partials 7656 7657 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // Leaving subsystem empty to skip duplicating this metric for each component | ||
| prometheus.CounterOpts{ | ||
| Namespace: pkgMetrics.PrometheusNamespace, | ||
| Namespace: "rox", |
There was a problem hiding this comment.
shouldn't this remain a constant some place?
|
@janisz: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Convert pkg/ and sensor metrics from package-level init() to explicit Init() calls to fix OOMKills in config-controller (128Mi) and admission-control (500Mi). Changes: - pkg/grpc/metrics, pkg/images, pkg/postgres, pkg/rate, pkg/tlscheckcache: init() → Init() - sensor/admission-control/manager, sensor/common/centralproxy, sensor/common/metrics: init() → Init() - central/metrics: create Init() function consolidating metric registrations - central/app: create app.go + init.go calling initMetrics() - sensor/*/app: create init.go calling initMetrics() With busybox binary consolidation, all init() functions run for all components. This change makes metrics initialization conditional - only central registers central metrics, only sensor registers sensor metrics. Expected memory impact: - config-controller: 150MB → < 100MB - admission-control: 600MB → < 400MB - OOMKills: 7 + 6-7 per replica → 0 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
roxvet error: sensor/common cannot import from sensor/kubernetes/listener/resources/metrics Moved listenerresourcesmetrics.Init() from sensor/common/metrics to sensor/kubernetes/app/init.go where it belongs (kubernetes-specific, not common).
Problem: golangci-lint reported unused log variable in central/app/app.go. Root cause: metrics PR accidentally included initCompliance() and initGraphQL() calls that belong in separate PRs. Solution: simplify central/app to only include metrics initialization, removing cross-PR contamination and the unused log variable. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Problem: Test_CentralReceiverSuite tests failing because metrics were not registered. Tests expected componentProcessMessageErrorsCount metric to exist but it showed 0 instead of 5. Root cause: After converting init() → Init(), the tests don't automatically register metrics anymore. Tests used to pass because init() ran automatically. Solution: Add TestMain() that calls metrics.Init() before running tests. Fixes: - Test_CentralReceiverSuite/Test_ComponentProcessMessageErrorsMetric - Test_CentralReceiverSuite/Test_SlowComponentDropMessages Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
sensor/admission-control/manager/metrics.go (1)
80-89:⚠️ Potential issue | 🟠 MajorMake
Init()idempotent to prevent duplicate registration panics.Moving from a package
init()to an explicitInit()function shifts responsibility for single-invocation guarantees from the Go runtime to the caller. Sinceprometheus.MustRegister()panics on duplicate registration, this should be protected withsync.Onceto prevent crashes ifInit()is called more than once—whether in tests or future call paths.Suggested fix
import ( + "sync" "time" @@ ) +var metricsRegisterOnce sync.Once + // Init registers all admission-control prometheus metrics. // Called explicitly from sensor/admission-control/app/app.go instead of package init(). func Init() { - prometheus.MustRegister( - imageCacheOperations, - imageFetchTotal, - imageFetchDuration, - imageFetchesPerReview, - policyevalReviewDuration, - policyevalReviewTotal, - ) + metricsRegisterOnce.Do(func() { + prometheus.MustRegister( + imageCacheOperations, + imageFetchTotal, + imageFetchDuration, + imageFetchesPerReview, + policyevalReviewDuration, + policyevalReviewTotal, + ) + }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sensor/admission-control/manager/metrics.go` around lines 80 - 89, Init() currently calls prometheus.MustRegister(...) directly which will panic if called multiple times; make Init idempotent by adding a package-level sync.Once (e.g., initOnce) and move the prometheus.MustRegister(...) call inside initOnce.Do(func() { ... }), so subsequent Init() calls become no-ops; reference the Init function and the prometheus.MustRegister invocation and use the sync.Once to guard registration of imageCacheOperations, imageFetchTotal, imageFetchDuration, imageFetchesPerReview, policyevalReviewDuration, and policyevalReviewTotal.sensor/common/clusterentities/metrics/metrics.go (1)
84-90:⚠️ Potential issue | 🟠 MajorMake metric
Init()idempotent to prevent duplicate-registration panics.
prometheus.MustRegisterpanics if the same collector is registered twice. While the function is currently called only once during initialization, it's now exported and callable multiple times, creating risk for future misuse. Protect against this with async.Onceguard:Suggested fix
import ( + "sync" "time" "github.com/prometheus/client_golang/prometheus" "github.com/stackrox/rox/pkg/metrics" ) var ( + registerOnce sync.Once + containersStored = prometheus.NewGaugeVec(prometheus.GaugeOpts{ Namespace: metrics.PrometheusNamespace, Subsystem: metrics.SensorSubsystem.String(), @@ // Init registers cluster entities metrics. func Init() { - prometheus.MustRegister(containersStored) - prometheus.MustRegister(ipsStored) - prometheus.MustRegister(endpointsStored) - prometheus.MustRegister(ipsHavingMultipleContainers) - prometheus.MustRegister(storeLockHeldDurationSeconds) + registerOnce.Do(func() { + prometheus.MustRegister(containersStored) + prometheus.MustRegister(ipsStored) + prometheus.MustRegister(endpointsStored) + prometheus.MustRegister(ipsHavingMultipleContainers) + prometheus.MustRegister(storeLockHeldDurationSeconds) + }) }Apply the same pattern to other component metrics modules:
sensor/common/pubsub/metrics/metrics.go,sensor/common/detector/metrics/metrics.go,sensor/common/networkflow/metrics/metrics.go,sensor/common/registry/metrics/metrics.go, andsensor/common/networkflow/updatecomputer/metrics.go.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sensor/common/clusterentities/metrics/metrics.go` around lines 84 - 90, The Init() function currently calls prometheus.MustRegister(...) directly which will panic if called more than once; make Init() idempotent by adding a package-level sync.Once (e.g., var initOnce sync.Once) and wrap the registration block in initOnce.Do(func(){ ... }) so containersStored, ipsStored, endpointsStored, ipsHavingMultipleContainers, and storeLockHeldDurationSeconds are registered only once; apply the same sync.Once pattern to the Init functions in sensor/common/pubsub/metrics/metrics.go, sensor/common/detector/metrics/metrics.go, sensor/common/networkflow/metrics/metrics.go, sensor/common/registry/metrics/metrics.go, and sensor/common/networkflow/updatecomputer/metrics.go.
🧹 Nitpick comments (2)
sensor/kubernetes/listener/resources/metrics/metrics.go (1)
25-27: Consider guardingInit()withsync.Onceto prevent potential duplicate-registration panics.While
listenerresourcesmetrics.Init()is currently called from a single site (sensor/kubernetes/app/init.go), the exported function could be invoked multiple times if called directly in tests or future initialization paths. Sinceprometheus.MustRegister(...)panics on duplicate registration, protecting this withsync.Onceis a defensive improvement.Suggested fix
package metrics import ( + "sync" + "github.com/prometheus/client_golang/prometheus" "github.com/stackrox/rox/pkg/metrics" ) @@ var ( + registerOnce sync.Once + podsStored = prometheus.NewGaugeVec(prometheus.GaugeOpts{ @@ // Init registers listener resource metrics. func Init() { - prometheus.MustRegister(podsStored) + registerOnce.Do(func() { + prometheus.MustRegister(podsStored) + }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sensor/kubernetes/listener/resources/metrics/metrics.go` around lines 25 - 27, The Init function currently calls prometheus.MustRegister(podsStored) unguarded and can panic on duplicate registration; wrap the registration in a sync.Once to ensure podsStored is only registered once. Add a package-level sync.Once variable (e.g., var initOnce sync.Once) and change Init() to call initOnce.Do(func() { prometheus.MustRegister(podsStored) }) so repeated Init() calls (from tests or other init paths) are safe.sensor/kubernetes/complianceoperator/metrics.go (1)
9-14: GuardInitMetrics()registration withsync.Onceto ensure idempotency.This function is now exported and no longer an implicit
init()block, making it callable from external code. Whileprometheus.MustRegister()panics on duplicate registration, this is best handled defensively by wrapping the call withsync.Onceto guarantee single registration regardless of how many times the function is invoked.Suggested pattern
+import "sync" + +var registerMetricsOnce sync.Once + // InitMetrics registers compliance operator metrics. func InitMetrics() { - prometheus.MustRegister( - commandsFromCentral, - applyScanConfigCommands, - ) + registerMetricsOnce.Do(func() { + prometheus.MustRegister( + commandsFromCentral, + applyScanConfigCommands, + ) + }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sensor/kubernetes/complianceoperator/metrics.go` around lines 9 - 14, InitMetrics currently calls prometheus.MustRegister(...) directly which can panic if called more than once; make registration idempotent by adding a package-level sync.Once (e.g., var metricsOnce sync.Once) and wrap the prometheus.MustRegister(commandsFromCentral, applyScanConfigCommands) call inside metricsOnce.Do(func(){ ... }); update InitMetrics to invoke that Do so repeated calls are safe and duplicate registration is avoided.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@sensor/admission-control/manager/metrics.go`:
- Around line 80-89: Init() currently calls prometheus.MustRegister(...)
directly which will panic if called multiple times; make Init idempotent by
adding a package-level sync.Once (e.g., initOnce) and move the
prometheus.MustRegister(...) call inside initOnce.Do(func() { ... }), so
subsequent Init() calls become no-ops; reference the Init function and the
prometheus.MustRegister invocation and use the sync.Once to guard registration
of imageCacheOperations, imageFetchTotal, imageFetchDuration,
imageFetchesPerReview, policyevalReviewDuration, and policyevalReviewTotal.
In `@sensor/common/clusterentities/metrics/metrics.go`:
- Around line 84-90: The Init() function currently calls
prometheus.MustRegister(...) directly which will panic if called more than once;
make Init() idempotent by adding a package-level sync.Once (e.g., var initOnce
sync.Once) and wrap the registration block in initOnce.Do(func(){ ... }) so
containersStored, ipsStored, endpointsStored, ipsHavingMultipleContainers, and
storeLockHeldDurationSeconds are registered only once; apply the same sync.Once
pattern to the Init functions in sensor/common/pubsub/metrics/metrics.go,
sensor/common/detector/metrics/metrics.go,
sensor/common/networkflow/metrics/metrics.go,
sensor/common/registry/metrics/metrics.go, and
sensor/common/networkflow/updatecomputer/metrics.go.
---
Nitpick comments:
In `@sensor/kubernetes/complianceoperator/metrics.go`:
- Around line 9-14: InitMetrics currently calls prometheus.MustRegister(...)
directly which can panic if called more than once; make registration idempotent
by adding a package-level sync.Once (e.g., var metricsOnce sync.Once) and wrap
the prometheus.MustRegister(commandsFromCentral, applyScanConfigCommands) call
inside metricsOnce.Do(func(){ ... }); update InitMetrics to invoke that Do so
repeated calls are safe and duplicate registration is avoided.
In `@sensor/kubernetes/listener/resources/metrics/metrics.go`:
- Around line 25-27: The Init function currently calls
prometheus.MustRegister(podsStored) unguarded and can panic on duplicate
registration; wrap the registration in a sync.Once to ensure podsStored is only
registered once. Add a package-level sync.Once variable (e.g., var initOnce
sync.Once) and change Init() to call initOnce.Do(func() {
prometheus.MustRegister(podsStored) }) so repeated Init() calls (from tests or
other init paths) are safe.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: f6fdbf1f-1ddd-487a-b54c-d6def9917b01
📒 Files selected for processing (25)
central/app/app.gocentral/app/init.gocentral/metrics/init.gopkg/grpc/metrics/prom.gopkg/images/metrics.gopkg/postgres/metrics.gopkg/rate/metrics.gopkg/tlscheckcache/tlscheckcache_metrics.gosensor/admission-control/app/app.gosensor/admission-control/app/init.gosensor/admission-control/manager/metrics.gosensor/common/centralproxy/metrics.gosensor/common/clusterentities/metrics/metrics.gosensor/common/detector/metrics/metrics.gosensor/common/metrics/init.gosensor/common/networkflow/metrics/metrics.gosensor/common/networkflow/updatecomputer/metrics.gosensor/common/pubsub/metrics/metrics.gosensor/common/registry/metrics/metrics.gosensor/common/sensor/central_receiver_test.gosensor/common/virtualmachine/metrics/metrics.gosensor/kubernetes/app/app.gosensor/kubernetes/app/init.gosensor/kubernetes/complianceoperator/metrics.gosensor/kubernetes/listener/resources/metrics/metrics.go
✅ Files skipped from review due to trivial changes (2)
- sensor/kubernetes/app/app.go
- sensor/admission-control/app/init.go
🚧 Files skipped from review as they are similar to previous changes (10)
- pkg/postgres/metrics.go
- sensor/kubernetes/app/init.go
- pkg/images/metrics.go
- pkg/rate/metrics.go
- pkg/grpc/metrics/prom.go
- sensor/common/centralproxy/metrics.go
- pkg/tlscheckcache/tlscheckcache_metrics.go
- central/app/init.go
- sensor/admission-control/app/app.go
- central/app/app.go
User request: "rebase on master and address code review comments" Changes: - Add postgres.Init() and rate.Init() to central metrics bootstrap - Add centralproxy.Init() to sensor metrics bootstrap - Replace hardcoded "rox" namespace with metrics.PrometheusNamespace constant - Add manager.Init() to admission-control startup (post-rebase structure) - Inline central/app/init.go into app.go per user feedback Why: Ensures all metrics packages are properly initialized at startup, preventing missing prometheus collectors. Uses shared constants to avoid duplication. Simplifies code structure by inlining single-use functions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
User request: "who calls sensor/kubernetes/complianceoperator/metrics.go InitMetrics?" Why: After rebase, sensor/kubernetes/app/ was removed in favor of main.go, but the orphaned app/init.go was never called. This meant: - complianceoperator.InitMetrics() was never invoked - sensor/common/metrics.Init() was never invoked - listener/resources/metrics.Init() was never invoked Fix: Inline all metrics initialization directly in main(), remove orphaned app/ directory. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…rpc/metrics} User request: "check if all initialisation functions are called" Missing Init() calls found: - pkg/images.Init() - used by central (image scan) and sensor (delegated scan) - pkg/tlscheckcache.Init() - used by central (TLS check cache) - pkg/grpc/metrics.Init() - used by both (ALPN handshake metrics) Why: These packages expose prometheus metrics but Init() was never called at startup, so collectors were never registered. Metrics would fail to record data. Added to: - central/app/app.go: images, tlscheckcache, grpc/metrics - sensor/kubernetes/main.go: images, grpc/metrics - sensor/admission-control/main.go: grpc/metrics Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
786fd1d to
112d71b
Compare
…stency User request: "can we rename sensor/kubernetes/complianceoperator/metrics.go InitMetrics to Init?" Why: All other metrics packages use Init(), this was the only one using InitMetrics(). Consistency makes the codebase easier to navigate. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Description
Convert pkg/ and sensor metrics from package-level init() to explicit Init() calls to fix OOMKills in config-controller (128Mi) and admission-control (500Mi).
Changes:
Refs:
User-facing documentation
Testing and quality
Automated testing
How I validated my change
change me!