Skip to content

ROX-33188: admission control label providers#19530

Open
AlexVulaj wants to merge 2 commits intomasterfrom
AlexVulaj/ROX-33188-admission-control-label-providers
Open

ROX-33188: admission control label providers#19530
AlexVulaj wants to merge 2 commits intomasterfrom
AlexVulaj/ROX-33188-admission-control-label-providers

Conversation

@AlexVulaj
Copy link
Contributor

@AlexVulaj AlexVulaj commented Mar 20, 2026

Description

Note, this PR's end-to-end functionality also relies on PR #19359 . They were written and tested together, but split to simplify the review process.

Wire Admission Controller's in-memory stores as label providers for policy evaluation, enabling full label-based policy scoping (cluster + namespace labels) in both runtime detection and admission control.

Policies with cluster_label and namespace_label scopes now work in Sensor's runtime detection and admission control.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Added comprehensive unit tests.

Executed comprehensive manual test plan covering three workflows:

Workflow 1: Admission control deploy-time enforcement (4/4 passed)

  • Cluster label scoping (env=prod) blocks privileged containers
  • Namespace label scoping (team=backend) blocks privileged containers
  • Combined cluster + namespace label scoping works correctly
  • Dynamic cluster label changes (prod→dev) and namespace filtering verified

Workflow 2: Runtime detection (Sensor) (3/3 passed)

  • Cluster-scoped runtime policy triggers on bash execution in prod cluster
  • Namespace-scoped runtime policy triggers only in matching namespace (backend)
  • Namespace filtering works correctly (frontend namespace doesn't trigger backend policy)

Workflow 3: Verification checks (2/2 passed)

  • Cluster labels flow from SecuredCluster CR → ConfigMap → Admission Control
  • No errors in Sensor or admission-control logs related to label providers or scoping

Test setup:

  • ACS deployed via roxie with feature flag ROX_LABEL_BASED_POLICY_SCOPING=true
  • Image: 4.11.x-348-g89eb433197 (includes namespace sync and enrichment fixes)
  • Cluster labels set via SecuredCluster CR (env=prod/dev, region=us-east-1)
  • Namespace labels set via kubectl (team=backend/frontend, tier=app)
  • 5 test policies created via API (3 deploy-time with SCALE_TO_ZERO_ENFORCEMENT, 2 runtime)
  • Tested both admission control (deploy-time) and Sensor (runtime) label-based scoping

@AlexVulaj AlexVulaj requested a review from a team as a code owner March 20, 2026 20:13
@AlexVulaj AlexVulaj changed the title Alex vulaj/rox 33188 admission control label providers ROX-33188: admission control label providers Mar 20, 2026
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • settingsToConfigMap now unconditionally calls settings.GetClusterLabels().MarshalVT(), which will panic if ClusterLabels is nil; consider guarding for nil and either omitting the BinaryData entry or writing an empty labels message.
  • kubernetes/listener namespaceStore.addNamespace no longer clones NamespaceMetadata before storing, which changes semantics vs the previous CloneVT() and may allow external mutation of in-memory state; double-check that this aliasing is intentional across all namespace consumers.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- settingsToConfigMap now unconditionally calls settings.GetClusterLabels().MarshalVT(), which will panic if ClusterLabels is nil; consider guarding for nil and either omitting the BinaryData entry or writing an empty labels message.
- kubernetes/listener namespaceStore.addNamespace no longer clones NamespaceMetadata before storing, which changes semantics vs the previous CloneVT() and may allow external mutation of in-memory state; double-check that this aliasing is intentional across all namespace consumers.

## Individual Comments

### Comment 1
<location path="sensor/kubernetes/listener/resources/namespace_store.go" line_range="34-35" />
<code_context>
 	defer n.lock.Unlock()

-	n.namespaces[ns.GetName()] = ns.CloneVT()
+	n.namespaces[ns.GetName()] = ns
+	n.namespacesByID[ns.GetId()] = ns
 }

</code_context>
<issue_to_address>
**suggestion (bug_risk):** Re-using the original NamespaceMetadata pointer instead of a clone can introduce unintended shared mutations

Previously the store kept its own copy via `ns.CloneVT()`, so later changes to the caller’s `*storage.NamespaceMetadata` did not affect stored state. Now the same pointer is shared between the caller, `namespaces`, and `namespacesByID`, so any post-insert mutation will impact the store and can introduce subtle bugs or data races if immutability is not guaranteed. To preserve the prior isolation, keep cloning before insertion:

```go
cloned := ns.CloneVT()
n.namespaces[cloned.GetName()] = cloned
n.namespacesByID[cloned.GetId()] = cloned
```

```suggestion
	cloned := ns.CloneVT()
	n.namespaces[cloned.GetName()] = cloned
	n.namespacesByID[cloned.GetId()] = cloned
```
</issue_to_address>

### Comment 2
<location path="sensor/admission-control/manager/evaluate_runtime.go" line_range="107" />
<code_context>
 		log.Debugf("Found deployment %s (id=%s) for %s/%s", deployment.GetName(), deployment.GetId(),
 			event.GetObject().GetNamespace(), event.GetObject().GetName())

+		// Create a request-scoped context for runtime detection so it respects admission timeouts.
+		var fetchImgCtx context.Context
+		if timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds(); timeoutSecs > 1 {
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the repeated timeout-based context creation into a shared helper to simplify and unify all runtime detection call sites.

You can reduce the new complexity and duplication by centralizing the timeout/context setup into a small helper, then reusing it in all runtime‑detection paths.

Right now, you have three nearly identical blocks:

```go
var fetchImgCtx context.Context
if timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds(); timeoutSecs > 1 {
    var cancel context.CancelFunc
    fetchImgCtx, cancel = context.WithTimeout(context.Background(), time.Duration(timeoutSecs)*time.Second)
    defer cancel()
} else {
    fetchImgCtx = context.Background()
}
```

and similarly for `detectionCtx` and in `waitForDeploymentAndDetect`.

You can factor this into a helper that keeps the behavior intact but simplifies the call sites:

```go
// helper on *state or *manager, whichever is more appropriate
func runtimeDetectionCtx(s *state) (context.Context, context.CancelFunc) {
    timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds()
    if timeoutSecs > 1 {
        return context.WithTimeout(context.Background(), time.Duration(timeoutSecs)*time.Second)
    }
    // no-op cancel to keep call sites uniform
    return context.Background(), func() {}
}
```

Then use it consistently:

```go
func (m *manager) evaluatePodEvent(...) (..., error) {
    deployment := m.getDeploymentForPod(...)
    if deployment != nil {
        log.Debugf(...)

        ctx, cancel := runtimeDetectionCtx(s)
        defer cancel()

        if len(s.deployFieldK8sDetector.PolicySet().GetCompiledPolicies()) == 0 {
            alerts, err := s.eventOnlyK8sDetector.
                DetectForDeploymentAndKubeEvent(ctx, booleanpolicy.EnhancedDeployment{
                    Deployment: deployment,
                    Images:     make([]*storage.Image, len(deployment.GetContainers())),
                }, event)
            ...
        }

        getAlertsFunc := func(dep *storage.Deployment, imgs []*storage.Image) ([]*storage.Alert, error) {
            enhancedDeployment := booleanpolicy.EnhancedDeployment{Deployment: dep, Images: imgs}
            return s.allK8sEventDetector.DetectForDeploymentAndKubeEvent(ctx, enhancedDeployment, event)
        }

        alerts, err := m.kickOffImgScansAndDetect(ctx, s, getAlertsFunc, deployment)
        ...
    }

    // no deployment
    ctx, cancel := runtimeDetectionCtx(s)
    defer cancel()

    alerts, err := s.eventOnlyK8sDetector.
        DetectForDeploymentAndKubeEvent(ctx, booleanpolicy.EnhancedDeployment{}, event)
    ...
}
```

And in `waitForDeploymentAndDetect`:

```go
func (m *manager) waitForDeploymentAndDetect(s *state, event *storage.KubernetesEvent) {
    if len(s.deployFieldK8sDetector.PolicySet().GetCompiledPolicies()) == 0 {
        return
    }

    ...

    ctx, cancel := runtimeDetectionCtx(s)
    defer cancel()

    getAlertsFunc := func(dep *storage.Deployment, imgs []*storage.Image) ([]*storage.Alert, error) {
        enhancedDeployment := booleanpolicy.EnhancedDeployment{Deployment: dep, Images: imgs}
        return s.deployFieldK8sDetector.DetectForDeploymentAndKubeEvent(ctx, enhancedDeployment, event)
    }

    alerts, err := m.kickOffImgScansAndDetect(ctx, s, getAlertsFunc, deployment)
    ...
}
```

This keeps all existing timeout behavior, ensures a single request‑scoped context per call, and removes the repeated timeout/`defer cancel()` boilerplate from each detection path.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +34 to +35
n.namespaces[ns.GetName()] = ns
n.namespacesByID[ns.GetId()] = ns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Re-using the original NamespaceMetadata pointer instead of a clone can introduce unintended shared mutations

Previously the store kept its own copy via ns.CloneVT(), so later changes to the caller’s *storage.NamespaceMetadata did not affect stored state. Now the same pointer is shared between the caller, namespaces, and namespacesByID, so any post-insert mutation will impact the store and can introduce subtle bugs or data races if immutability is not guaranteed. To preserve the prior isolation, keep cloning before insertion:

cloned := ns.CloneVT()
n.namespaces[cloned.GetName()] = cloned
n.namespacesByID[cloned.GetId()] = cloned
Suggested change
n.namespaces[ns.GetName()] = ns
n.namespacesByID[ns.GetId()] = ns
cloned := ns.CloneVT()
n.namespaces[cloned.GetName()] = cloned
n.namespacesByID[cloned.GetId()] = cloned

log.Debugf("Found deployment %s (id=%s) for %s/%s", deployment.GetName(), deployment.GetId(),
event.GetObject().GetNamespace(), event.GetObject().GetName())

// Create a request-scoped context for runtime detection so it respects admission timeouts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting the repeated timeout-based context creation into a shared helper to simplify and unify all runtime detection call sites.

You can reduce the new complexity and duplication by centralizing the timeout/context setup into a small helper, then reusing it in all runtime‑detection paths.

Right now, you have three nearly identical blocks:

var fetchImgCtx context.Context
if timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds(); timeoutSecs > 1 {
    var cancel context.CancelFunc
    fetchImgCtx, cancel = context.WithTimeout(context.Background(), time.Duration(timeoutSecs)*time.Second)
    defer cancel()
} else {
    fetchImgCtx = context.Background()
}

and similarly for detectionCtx and in waitForDeploymentAndDetect.

You can factor this into a helper that keeps the behavior intact but simplifies the call sites:

// helper on *state or *manager, whichever is more appropriate
func runtimeDetectionCtx(s *state) (context.Context, context.CancelFunc) {
    timeoutSecs := s.GetClusterConfig().GetAdmissionControllerConfig().GetTimeoutSeconds()
    if timeoutSecs > 1 {
        return context.WithTimeout(context.Background(), time.Duration(timeoutSecs)*time.Second)
    }
    // no-op cancel to keep call sites uniform
    return context.Background(), func() {}
}

Then use it consistently:

func (m *manager) evaluatePodEvent(...) (..., error) {
    deployment := m.getDeploymentForPod(...)
    if deployment != nil {
        log.Debugf(...)

        ctx, cancel := runtimeDetectionCtx(s)
        defer cancel()

        if len(s.deployFieldK8sDetector.PolicySet().GetCompiledPolicies()) == 0 {
            alerts, err := s.eventOnlyK8sDetector.
                DetectForDeploymentAndKubeEvent(ctx, booleanpolicy.EnhancedDeployment{
                    Deployment: deployment,
                    Images:     make([]*storage.Image, len(deployment.GetContainers())),
                }, event)
            ...
        }

        getAlertsFunc := func(dep *storage.Deployment, imgs []*storage.Image) ([]*storage.Alert, error) {
            enhancedDeployment := booleanpolicy.EnhancedDeployment{Deployment: dep, Images: imgs}
            return s.allK8sEventDetector.DetectForDeploymentAndKubeEvent(ctx, enhancedDeployment, event)
        }

        alerts, err := m.kickOffImgScansAndDetect(ctx, s, getAlertsFunc, deployment)
        ...
    }

    // no deployment
    ctx, cancel := runtimeDetectionCtx(s)
    defer cancel()

    alerts, err := s.eventOnlyK8sDetector.
        DetectForDeploymentAndKubeEvent(ctx, booleanpolicy.EnhancedDeployment{}, event)
    ...
}

And in waitForDeploymentAndDetect:

func (m *manager) waitForDeploymentAndDetect(s *state, event *storage.KubernetesEvent) {
    if len(s.deployFieldK8sDetector.PolicySet().GetCompiledPolicies()) == 0 {
        return
    }

    ...

    ctx, cancel := runtimeDetectionCtx(s)
    defer cancel()

    getAlertsFunc := func(dep *storage.Deployment, imgs []*storage.Image) ([]*storage.Alert, error) {
        enhancedDeployment := booleanpolicy.EnhancedDeployment{Deployment: dep, Images: imgs}
        return s.deployFieldK8sDetector.DetectForDeploymentAndKubeEvent(ctx, enhancedDeployment, event)
    }

    alerts, err := m.kickOffImgScansAndDetect(ctx, s, getAlertsFunc, deployment)
    ...
}

This keeps all existing timeout behavior, ensures a single request‑scoped context per call, and removes the repeated timeout/defer cancel() boilerplate from each detection path.

@AlexVulaj AlexVulaj added the do-not-merge A change which is not meant to be merged label Mar 20, 2026
@AlexVulaj
Copy link
Contributor Author

Added "do-not-merge", as this depends on #19359

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Mar 20, 2026

Images are ready for the commit at b762a1e.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-396-gb762a1ea42.

@AlexVulaj AlexVulaj force-pushed the AlexVulaj/ROX-33188-admission-control-label-providers branch from 8870afa to b762a1e Compare March 20, 2026 21:32
@codecov
Copy link

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 38.46154% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.18%. Comparing base (c2045aa) to head (b762a1e).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...nsor/admission-control/manager/evaluate_runtime.go 0.00% 23 Missing ⚠️
...ommon/admissioncontroller/settings_manager_impl.go 0.00% 18 Missing ⚠️
sensor/admission-control/manager/manager_impl.go 33.33% 12 Missing ⚠️
...r/kubernetes/listener/resources/namespace_store.go 52.38% 10 Missing ⚠️
...or/kubernetes/listener/resources/store_provider.go 0.00% 10 Missing ⚠️
sensor/admission-control/resources/namespace.go 70.00% 9 Missing ⚠️
...rnetes/admissioncontroller/config_map_persister.go 33.33% 4 Missing and 2 partials ⚠️
sensor/admission-control/settingswatch/k8s.go 0.00% 5 Missing ⚠️
sensor/common/detector/detector.go 0.00% 2 Missing ⚠️
...r/admission-control/manager/evaluate_deploytime.go 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19530      +/-   ##
==========================================
- Coverage   49.23%   49.18%   -0.05%     
==========================================
  Files        2727     2735       +8     
  Lines      205764   206409     +645     
==========================================
+ Hits       101304   101525     +221     
- Misses      96921    97334     +413     
- Partials     7539     7550      +11     
Flag Coverage Δ
go-unit-tests 49.18% <38.46%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants