ROX-33188: admission control label providers#19530
Conversation
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- settingsToConfigMap now unconditionally calls settings.GetClusterLabels().MarshalVT(), which will panic if ClusterLabels is nil; consider guarding for nil and either omitting the BinaryData entry or writing an empty labels message.
- kubernetes/listener namespaceStore.addNamespace no longer clones NamespaceMetadata before storing, which changes semantics vs the previous CloneVT() and may allow external mutation of in-memory state; double-check that this aliasing is intentional across all namespace consumers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- settingsToConfigMap now unconditionally calls settings.GetClusterLabels().MarshalVT(), which will panic if ClusterLabels is nil; consider guarding for nil and either omitting the BinaryData entry or writing an empty labels message.
- kubernetes/listener namespaceStore.addNamespace no longer clones NamespaceMetadata before storing, which changes semantics vs the previous CloneVT() and may allow external mutation of in-memory state; double-check that this aliasing is intentional across all namespace consumers.
## Individual Comments
### Comment 1
<location path="sensor/kubernetes/listener/resources/namespace_store.go" line_range="34-35" />
<code_context>
defer n.lock.Unlock()
- n.namespaces[ns.GetName()] = ns.CloneVT()
+ n.namespaces[ns.GetName()] = ns
+ n.namespacesByID[ns.GetId()] = ns
}
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Re-using the original NamespaceMetadata pointer instead of a clone can introduce unintended shared mutations
Previously the store kept its own copy via `ns.CloneVT()`, so later changes to the caller’s `*storage.NamespaceMetadata` did not affect stored state. Now the same pointer is shared between the caller, `namespaces`, and `namespacesByID`, so any post-insert mutation will impact the store and can introduce subtle bugs or data races if immutability is not guaranteed. To preserve the prior isolation, keep cloning before insertion:
```go
cloned := ns.CloneVT()
n.namespaces[cloned.GetName()] = cloned
n.namespacesByID[cloned.GetId()] = cloned
```
```suggestion
cloned := ns.CloneVT()
n.namespaces[cloned.GetName()] = cloned
n.namespacesByID[cloned.GetId()] = cloned
```
</issue_to_address>
### Comment 2
<location path="sensor/admission-control/manager/evaluate_runtime.go" line_range="107" />
<code_context>
log.Debugf("Found deployment %s (id=%s) for %s/%s", deployment.GetName(), deployment.GetId(),
event.GetObject().GetNamespace(), event.GetObject().GetName())
+ // Create a request-scoped context for runtime detection so it respects admission timeouts.
+ var fetchImgCtx context.Context
+ if timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds(); timeoutSecs > 1 {
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the repeated timeout-based context creation into a shared helper to simplify and unify all runtime detection call sites.
You can reduce the new complexity and duplication by centralizing the timeout/context setup into a small helper, then reusing it in all runtime‑detection paths.
Right now, you have three nearly identical blocks:
```go
var fetchImgCtx context.Context
if timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds(); timeoutSecs > 1 {
var cancel context.CancelFunc
fetchImgCtx, cancel = context.WithTimeout(context.Background(), time.Duration(timeoutSecs)*time.Second)
defer cancel()
} else {
fetchImgCtx = context.Background()
}
```
and similarly for `detectionCtx` and in `waitForDeploymentAndDetect`.
You can factor this into a helper that keeps the behavior intact but simplifies the call sites:
```go
// helper on *state or *manager, whichever is more appropriate
func runtimeDetectionCtx(s *state) (context.Context, context.CancelFunc) {
timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds()
if timeoutSecs > 1 {
return context.WithTimeout(context.Background(), time.Duration(timeoutSecs)*time.Second)
}
// no-op cancel to keep call sites uniform
return context.Background(), func() {}
}
```
Then use it consistently:
```go
func (m *manager) evaluatePodEvent(...) (..., error) {
deployment := m.getDeploymentForPod(...)
if deployment != nil {
log.Debugf(...)
ctx, cancel := runtimeDetectionCtx(s)
defer cancel()
if len(s.deployFieldK8sDetector.PolicySet().GetCompiledPolicies()) == 0 {
alerts, err := s.eventOnlyK8sDetector.
DetectForDeploymentAndKubeEvent(ctx, booleanpolicy.EnhancedDeployment{
Deployment: deployment,
Images: make([]*storage.Image, len(deployment.GetContainers())),
}, event)
...
}
getAlertsFunc := func(dep *storage.Deployment, imgs []*storage.Image) ([]*storage.Alert, error) {
enhancedDeployment := booleanpolicy.EnhancedDeployment{Deployment: dep, Images: imgs}
return s.allK8sEventDetector.DetectForDeploymentAndKubeEvent(ctx, enhancedDeployment, event)
}
alerts, err := m.kickOffImgScansAndDetect(ctx, s, getAlertsFunc, deployment)
...
}
// no deployment
ctx, cancel := runtimeDetectionCtx(s)
defer cancel()
alerts, err := s.eventOnlyK8sDetector.
DetectForDeploymentAndKubeEvent(ctx, booleanpolicy.EnhancedDeployment{}, event)
...
}
```
And in `waitForDeploymentAndDetect`:
```go
func (m *manager) waitForDeploymentAndDetect(s *state, event *storage.KubernetesEvent) {
if len(s.deployFieldK8sDetector.PolicySet().GetCompiledPolicies()) == 0 {
return
}
...
ctx, cancel := runtimeDetectionCtx(s)
defer cancel()
getAlertsFunc := func(dep *storage.Deployment, imgs []*storage.Image) ([]*storage.Alert, error) {
enhancedDeployment := booleanpolicy.EnhancedDeployment{Deployment: dep, Images: imgs}
return s.deployFieldK8sDetector.DetectForDeploymentAndKubeEvent(ctx, enhancedDeployment, event)
}
alerts, err := m.kickOffImgScansAndDetect(ctx, s, getAlertsFunc, deployment)
...
}
```
This keeps all existing timeout behavior, ensures a single request‑scoped context per call, and removes the repeated timeout/`defer cancel()` boilerplate from each detection path.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| n.namespaces[ns.GetName()] = ns | ||
| n.namespacesByID[ns.GetId()] = ns |
There was a problem hiding this comment.
suggestion (bug_risk): Re-using the original NamespaceMetadata pointer instead of a clone can introduce unintended shared mutations
Previously the store kept its own copy via ns.CloneVT(), so later changes to the caller’s *storage.NamespaceMetadata did not affect stored state. Now the same pointer is shared between the caller, namespaces, and namespacesByID, so any post-insert mutation will impact the store and can introduce subtle bugs or data races if immutability is not guaranteed. To preserve the prior isolation, keep cloning before insertion:
cloned := ns.CloneVT()
n.namespaces[cloned.GetName()] = cloned
n.namespacesByID[cloned.GetId()] = cloned| n.namespaces[ns.GetName()] = ns | |
| n.namespacesByID[ns.GetId()] = ns | |
| cloned := ns.CloneVT() | |
| n.namespaces[cloned.GetName()] = cloned | |
| n.namespacesByID[cloned.GetId()] = cloned |
| log.Debugf("Found deployment %s (id=%s) for %s/%s", deployment.GetName(), deployment.GetId(), | ||
| event.GetObject().GetNamespace(), event.GetObject().GetName()) | ||
|
|
||
| // Create a request-scoped context for runtime detection so it respects admission timeouts. |
There was a problem hiding this comment.
issue (complexity): Consider extracting the repeated timeout-based context creation into a shared helper to simplify and unify all runtime detection call sites.
You can reduce the new complexity and duplication by centralizing the timeout/context setup into a small helper, then reusing it in all runtime‑detection paths.
Right now, you have three nearly identical blocks:
var fetchImgCtx context.Context
if timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds(); timeoutSecs > 1 {
var cancel context.CancelFunc
fetchImgCtx, cancel = context.WithTimeout(context.Background(), time.Duration(timeoutSecs)*time.Second)
defer cancel()
} else {
fetchImgCtx = context.Background()
}and similarly for detectionCtx and in waitForDeploymentAndDetect.
You can factor this into a helper that keeps the behavior intact but simplifies the call sites:
// helper on *state or *manager, whichever is more appropriate
func runtimeDetectionCtx(s *state) (context.Context, context.CancelFunc) {
timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds()
if timeoutSecs > 1 {
return context.WithTimeout(context.Background(), time.Duration(timeoutSecs)*time.Second)
}
// no-op cancel to keep call sites uniform
return context.Background(), func() {}
}Then use it consistently:
func (m *manager) evaluatePodEvent(...) (..., error) {
deployment := m.getDeploymentForPod(...)
if deployment != nil {
log.Debugf(...)
ctx, cancel := runtimeDetectionCtx(s)
defer cancel()
if len(s.deployFieldK8sDetector.PolicySet().GetCompiledPolicies()) == 0 {
alerts, err := s.eventOnlyK8sDetector.
DetectForDeploymentAndKubeEvent(ctx, booleanpolicy.EnhancedDeployment{
Deployment: deployment,
Images: make([]*storage.Image, len(deployment.GetContainers())),
}, event)
...
}
getAlertsFunc := func(dep *storage.Deployment, imgs []*storage.Image) ([]*storage.Alert, error) {
enhancedDeployment := booleanpolicy.EnhancedDeployment{Deployment: dep, Images: imgs}
return s.allK8sEventDetector.DetectForDeploymentAndKubeEvent(ctx, enhancedDeployment, event)
}
alerts, err := m.kickOffImgScansAndDetect(ctx, s, getAlertsFunc, deployment)
...
}
// no deployment
ctx, cancel := runtimeDetectionCtx(s)
defer cancel()
alerts, err := s.eventOnlyK8sDetector.
DetectForDeploymentAndKubeEvent(ctx, booleanpolicy.EnhancedDeployment{}, event)
...
}And in waitForDeploymentAndDetect:
func (m *manager) waitForDeploymentAndDetect(s *state, event *storage.KubernetesEvent) {
if len(s.deployFieldK8sDetector.PolicySet().GetCompiledPolicies()) == 0 {
return
}
...
ctx, cancel := runtimeDetectionCtx(s)
defer cancel()
getAlertsFunc := func(dep *storage.Deployment, imgs []*storage.Image) ([]*storage.Alert, error) {
enhancedDeployment := booleanpolicy.EnhancedDeployment{Deployment: dep, Images: imgs}
return s.deployFieldK8sDetector.DetectForDeploymentAndKubeEvent(ctx, enhancedDeployment, event)
}
alerts, err := m.kickOffImgScansAndDetect(ctx, s, getAlertsFunc, deployment)
...
}This keeps all existing timeout behavior, ensures a single request‑scoped context per call, and removes the repeated timeout/defer cancel() boilerplate from each detection path.
|
Added "do-not-merge", as this depends on #19359 |
|
Images are ready for the commit at b762a1e. To use with deploy scripts, first |
8870afa to
b762a1e
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #19530 +/- ##
==========================================
- Coverage 49.23% 49.18% -0.05%
==========================================
Files 2727 2735 +8
Lines 205764 206409 +645
==========================================
+ Hits 101304 101525 +221
- Misses 96921 97334 +413
- Partials 7539 7550 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
Note, this PR's end-to-end functionality also relies on PR #19359 . They were written and tested together, but split to simplify the review process.
Wire Admission Controller's in-memory stores as label providers for policy evaluation, enabling full label-based policy scoping (cluster + namespace labels) in both runtime detection and admission control.
Policies with
cluster_labelandnamespace_labelscopes now work in Sensor's runtime detection and admission control.User-facing documentation
Testing and quality
Automated testing
How I validated my change
Added comprehensive unit tests.
Executed comprehensive manual test plan covering three workflows:
Workflow 1: Admission control deploy-time enforcement (4/4 passed)
Workflow 2: Runtime detection (Sensor) (3/3 passed)
Workflow 3: Verification checks (2/2 passed)
Test setup:
ROX_LABEL_BASED_POLICY_SCOPING=true4.11.x-348-g89eb433197(includes namespace sync and enrichment fixes)env=prod/dev,region=us-east-1)team=backend/frontend,tier=app)