ROX-33319: Add monitoring support to admission controller templates#19226
ROX-33319: Add monitoring support to admission controller templates#19226
Conversation
|
Skipping CI for Draft Pull Request. |
|
/test all |
|
Images are ready for the commit at 33f8ce8. To use with deploy scripts, first |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19226 +/- ##
=======================================
Coverage 49.56% 49.56%
=======================================
Files 2675 2675
Lines 201838 201838
=======================================
+ Hits 100035 100036 +1
+ Misses 94346 94345 -1
Partials 7457 7457
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/test ocp-4-12-qa-e2e-tests |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
sensor/admission-control/main.go,metrics.NewServer(...).RunForever()is called beforesettingswatch.WatchK8sForSettingsUpdatesAsync, and ifRunForever()blocks (as its name suggests) the settings watcher is never started; consider running the metrics server in a separate goroutine or moving the call so it doesn't prevent later initialization. - For the new admission-control metrics in
manager/metrics.go, consider introducing typed constants or enums for label values likeresultandsource(e.g.,hit/miss/expired,sensor/central,allowed/denied/...) to ensure producers use consistent values and avoid accidental cardinality explosions due to typos. - The new
AdmissionControlSubsystemmetrics subsystem string uses"admission_control"while most external identifiers (service name, labels) use a hyphen; consider aligning this naming (or documenting the difference) to reduce confusion when querying metrics across subsystems.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `sensor/admission-control/main.go`, `metrics.NewServer(...).RunForever()` is called before `settingswatch.WatchK8sForSettingsUpdatesAsync`, and if `RunForever()` blocks (as its name suggests) the settings watcher is never started; consider running the metrics server in a separate goroutine or moving the call so it doesn't prevent later initialization.
- For the new admission-control metrics in `manager/metrics.go`, consider introducing typed constants or enums for label values like `result` and `source` (e.g., `hit/miss/expired`, `sensor/central`, `allowed/denied/...`) to ensure producers use consistent values and avoid accidental cardinality explosions due to typos.
- The new `AdmissionControlSubsystem` metrics subsystem string uses `"admission_control"` while most external identifiers (service name, labels) use a hyphen; consider aligning this naming (or documenting the difference) to reduce confusion when querying metrics across subsystems.
## Individual Comments
### Comment 1
<location path="image/templates/helm/stackrox-secured-cluster/templates/admission-controller-netpol.yaml" line_range="61-45" />
<code_context>
+ auto-upgrade.stackrox.io/component: "sensor"
+ annotations:
+ {{- include "srox.annotations" (list . "networkpolicy" "admission-control-monitoring-tls") | nindent 4 }}
+spec:
+ ingress:
+ - ports:
+ - port: 9091
+ protocol: TCP
+ podSelector:
+ matchLabels:
+ app: admission-control
+ policyTypes:
+ - Ingress
+{{- end }}
</code_context>
<issue_to_address>
**🚨 suggestion (security):** NetworkPolicy allows metrics ingress from all sources; consider constraining to Prometheus/monitoring namespaces.
This NetworkPolicy defines ingress on port 9091 with a `podSelector` but no `from` clause, so any pod in any namespace can reach this metrics endpoint. If possible, scope ingress with `namespaceSelector`/`podSelector` to only the OpenShift monitoring/Prometheus pods that scrape these metrics.
Suggested implementation:
```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: admission-control-monitoring-tls
namespace: {{ ._rox._namespace }}
labels:
{{- include "srox.labels" (list . "networkpolicy" "admission-control-monitoring-tls") | nindent 4 }}
spec:
podSelector:
matchLabels:
app: admission-control
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchExpressions:
- key: kubernetes.io/metadata.name
operator: In
values:
- openshift-monitoring
- openshift-user-workload-monitoring
ports:
- port: 9091
protocol: TCP
</code>
```
1. Ensure the first `NetworkPolicy` in this file (the one before the `{{- if ._rox.monitoring.openshift.enabled }}` block) either has equivalent scoping or is clearly for a different purpose; if it also exposes port 9091 for monitoring traffic, you likely want the same `from` restrictions there as well.
2. Verify that the label `app: admission-control` matches the actual pod labels for the admission controller; if your deployment uses different labels (e.g., `app.kubernetes.io/name`), adjust the `podSelector` accordingly.
3. If your cluster uses different namespaces for Prometheus/monitoring, update the `values` under `kubernetes.io/metadata.name` to match those namespaces.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
image/templates/helm/stackrox-secured-cluster/templates/admission-controller-netpol.yaml
Show resolved
Hide resolved
|
/test all |
d9bbae1 to
33f8ce8
Compare
Description
9090and9091.main.go.Definition of the metrics counters, and admission controller code instrumentation to increment them will be in a follow on PR.
User-facing documentation
Testing and quality
Automated testing
How I validated my change
CI, manual deploy on infra cluster