Skip to content

ROX-33319: Add monitoring support to admission controller templates#19226

Open
clickboo wants to merge 1 commit intomasterfrom
boo-adm-cntrl-monitoring-helm
Open

ROX-33319: Add monitoring support to admission controller templates#19226
clickboo wants to merge 1 commit intomasterfrom
boo-adm-cntrl-monitoring-helm

Conversation

@clickboo
Copy link
Contributor

@clickboo clickboo commented Feb 27, 2026

Description

  1. Added support in admission controller template yamls for monitoring on both ports: 9090 and 9091.
  2. Started the metrics server in admission controller main.go.
  3. Added a new metrics subsystem for admission controller.

Definition of the metrics counters, and admission controller code instrumentation to increment them will be in a follow on PR.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

CI, manual deploy on infra cluster

$ curl -s localhost:9090/metrics | grep rox_admission_control
# HELP rox_admission_control_image_fetches_per_review Number of image fetch RPCs issued per admission review.
# TYPE rox_admission_control_image_fetches_per_review histogram
rox_admission_control_image_fetches_per_review_bucket{le="0"} 0
rox_admission_control_image_fetches_per_review_bucket{le="1"} 0
rox_admission_control_image_fetches_per_review_bucket{le="2"} 0
rox_admission_control_image_fetches_per_review_bucket{le="3"} 0
rox_admission_control_image_fetches_per_review_bucket{le="4"} 0
rox_admission_control_image_fetches_per_review_bucket{le="5"} 0
rox_admission_control_image_fetches_per_review_bucket{le="6"} 0
rox_admission_control_image_fetches_per_review_bucket{le="7"} 0
rox_admission_control_image_fetches_per_review_bucket{le="8"} 0
rox_admission_control_image_fetches_per_review_bucket{le="9"} 0
rox_admission_control_image_fetches_per_review_bucket{le="10"} 0
rox_admission_control_image_fetches_per_review_bucket{le="+Inf"} 0
rox_admission_control_image_fetches_per_review_sum 0
rox_admission_control_image_fetches_per_review_count 0
# HELP rox_admission_control_uptime_seconds Total number of seconds that the service has been up
# TYPE rox_admission_control_uptime_seconds gauge
rox_admission_control_uptime_seconds 80.003847641

@openshift-ci
Copy link

openshift-ci bot commented Feb 27, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@clickboo
Copy link
Contributor Author

/test all

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Feb 27, 2026

Images are ready for the commit at 33f8ce8.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-194-g33f8ce8be8.

@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.56%. Comparing base (9df4c3a) to head (33f8ce8).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19226   +/-   ##
=======================================
  Coverage   49.56%   49.56%           
=======================================
  Files        2675     2675           
  Lines      201838   201838           
=======================================
+ Hits       100035   100036    +1     
+ Misses      94346    94345    -1     
  Partials     7457     7457           
Flag Coverage Δ
go-unit-tests 49.56% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@clickboo
Copy link
Contributor Author

/test ocp-4-12-qa-e2e-tests

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In sensor/admission-control/main.go, metrics.NewServer(...).RunForever() is called before settingswatch.WatchK8sForSettingsUpdatesAsync, and if RunForever() blocks (as its name suggests) the settings watcher is never started; consider running the metrics server in a separate goroutine or moving the call so it doesn't prevent later initialization.
  • For the new admission-control metrics in manager/metrics.go, consider introducing typed constants or enums for label values like result and source (e.g., hit/miss/expired, sensor/central, allowed/denied/...) to ensure producers use consistent values and avoid accidental cardinality explosions due to typos.
  • The new AdmissionControlSubsystem metrics subsystem string uses "admission_control" while most external identifiers (service name, labels) use a hyphen; consider aligning this naming (or documenting the difference) to reduce confusion when querying metrics across subsystems.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `sensor/admission-control/main.go`, `metrics.NewServer(...).RunForever()` is called before `settingswatch.WatchK8sForSettingsUpdatesAsync`, and if `RunForever()` blocks (as its name suggests) the settings watcher is never started; consider running the metrics server in a separate goroutine or moving the call so it doesn't prevent later initialization.
- For the new admission-control metrics in `manager/metrics.go`, consider introducing typed constants or enums for label values like `result` and `source` (e.g., `hit/miss/expired`, `sensor/central`, `allowed/denied/...`) to ensure producers use consistent values and avoid accidental cardinality explosions due to typos.
- The new `AdmissionControlSubsystem` metrics subsystem string uses `"admission_control"` while most external identifiers (service name, labels) use a hyphen; consider aligning this naming (or documenting the difference) to reduce confusion when querying metrics across subsystems.

## Individual Comments

### Comment 1
<location path="image/templates/helm/stackrox-secured-cluster/templates/admission-controller-netpol.yaml" line_range="61-45" />
<code_context>
+    auto-upgrade.stackrox.io/component: "sensor"
+  annotations:
+    {{- include "srox.annotations" (list . "networkpolicy" "admission-control-monitoring-tls") | nindent 4 }}
+spec:
+  ingress:
+  - ports:
+    - port: 9091
+      protocol: TCP
+  podSelector:
+    matchLabels:
+      app: admission-control
+  policyTypes:
+    - Ingress
+{{- end }}
</code_context>
<issue_to_address>
**🚨 suggestion (security):** NetworkPolicy allows metrics ingress from all sources; consider constraining to Prometheus/monitoring namespaces.

This NetworkPolicy defines ingress on port 9091 with a `podSelector` but no `from` clause, so any pod in any namespace can reach this metrics endpoint. If possible, scope ingress with `namespaceSelector`/`podSelector` to only the OpenShift monitoring/Prometheus pods that scrape these metrics.

Suggested implementation:

```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: admission-control-monitoring-tls
  namespace: {{ ._rox._namespace }}
  labels:
    {{- include "srox.labels" (list . "networkpolicy" "admission-control-monitoring-tls") | nindent 4 }}
spec:
  podSelector:
    matchLabels:
      app: admission-control
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchExpressions:
              - key: kubernetes.io/metadata.name
                operator: In
                values:
                  - openshift-monitoring
                  - openshift-user-workload-monitoring
      ports:
        - port: 9091
          protocol: TCP
</code>

```

1. Ensure the first `NetworkPolicy` in this file (the one before the `{{- if ._rox.monitoring.openshift.enabled }}` block) either has equivalent scoping or is clearly for a different purpose; if it also exposes port 9091 for monitoring traffic, you likely want the same `from` restrictions there as well.
2. Verify that the label `app: admission-control` matches the actual pod labels for the admission controller; if your deployment uses different labels (e.g., `app.kubernetes.io/name`), adjust the `podSelector` accordingly.
3. If your cluster uses different namespaces for Prometheus/monitoring, update the `values` under `kubernetes.io/metadata.name` to match those namespaces.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@clickboo
Copy link
Contributor Author

/test all

@clickboo clickboo requested a review from vladbologa February 27, 2026 12:13
@clickboo clickboo marked this pull request as ready for review February 27, 2026 12:13
@clickboo clickboo requested review from a team as code owners February 27, 2026 12:13
@clickboo clickboo force-pushed the boo-adm-cntrl-monitoring-helm branch from d9bbae1 to 33f8ce8 Compare February 27, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants