Skip to content

ROX-33361: Per-namespace persistence for process indicators#19957

Open
erthalion wants to merge 1 commit intomasterfrom
feature/per-namespace-persistence
Open

ROX-33361: Per-namespace persistence for process indicators#19957
erthalion wants to merge 1 commit intomasterfrom
feature/per-namespace-persistence

Conversation

@erthalion
Copy link
Copy Markdown
Contributor

@erthalion erthalion commented Apr 13, 2026

Description

NOTE: It's been split from #19455, for the purposes of simplifying the review. The PR contains only the first commit, introducing the actual machinery. The implementation is exactly the same as in the original PR.

Allow to configure per-namespace persistence for process indicators, so that Central wouldn't need to store information, which never will be used.

It could be configured via DynamicConfig of the cluster configuration in the form:

message ProcessIndicators {
  string namespace_filter = 1;
  bool exclude_openshift_ns = 2;
  bool persistence = 3;
}

Where namespace_filter allows to specify a custom regex to filter out processes by matching namespace, exclude_openshift_ns instructs Central to exclude anything from openshift-* namespaces, and persistence can be used to disable storing process indicators at all.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Manual validation (creating an operator-managed cluster and modifying the configuration), as well as E2E tests. Split from #19455 to simplify the review.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

🚀 Build Images Ready

Images are ready for commit f03bc24. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-632-gf03bc24f22

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 47.86325% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.63%. Comparing base (8d9c528) to head (f03bc24).
⚠️ Report is 32 commits behind head on master.

Files with missing lines Patch % Lines
central/graphql/resolvers/generated.go 13.04% 40 Missing ⚠️
central/cluster/datastore/datastore_impl.go 53.12% 13 Missing and 2 partials ⚠️
central/detection/lifecycle/manager_impl.go 72.72% 2 Missing and 1 partial ⚠️
pkg/cluster/filtering.go 88.88% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19957      +/-   ##
==========================================
+ Coverage   49.56%   49.63%   +0.06%     
==========================================
  Files        2764     2766       +2     
  Lines      208357   208732     +375     
==========================================
+ Hits       103276   103595     +319     
- Misses      97430    97470      +40     
- Partials     7651     7667      +16     
Flag Coverage Δ
go-unit-tests 49.63% <47.86%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@erthalion erthalion force-pushed the feature/per-namespace-persistence branch from 680576b to e2ec42b Compare April 13, 2026 14:00
@erthalion
Copy link
Copy Markdown
Contributor Author

/test ocp-4-21-qa-e2e-tests

Comment thread proto/storage/cluster.proto Outdated
// The difference between Static and Dynamic cluster config is that Dynamic values are sent over the Central to Sensor gRPC connection. This has the benefit of allowing for "hot reloading" of values without restarting Secured cluster components.
message DynamicClusterConfig {
message ProcessIndicators {
string namespace_filter = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious whether this filter means namespaces to include or exclude.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. exclude_namespace_filter? namespace_exclusion_filter?

Comment thread proto/storage/cluster.proto Outdated
message ProcessIndicators {
string namespace_filter = 1;
bool exclude_openshift_ns = 2;
bool persistence = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the meaning of persistence?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add more commentaries to the proto file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, does it look better?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a few stated in the design that this should be the first item as it controls the other 2.

@erthalion erthalion force-pushed the feature/per-namespace-persistence branch 2 times, most recently from 57f2aa8 to cad5788 Compare April 14, 2026 10:22
@erthalion erthalion requested a review from porridge April 14, 2026 12:59
@erthalion
Copy link
Copy Markdown
Contributor Author

The only failing tests are flake #19959

Comment thread proto/storage/cluster.proto Outdated
// The difference between Static and Dynamic cluster config is that Dynamic values are sent over the Central to Sensor gRPC connection. This has the benefit of allowing for "hot reloading" of values without restarting Secured cluster components.
message DynamicClusterConfig {
// Configure per-namespace persistence for ProcessIndicators
message ProcessIndicators {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ProcessIndicatorConfig

if filter, err := clusterPkg.GetNamespaceFilter(c); err == nil {
compiledFilter, err := regexp.Compile(filter)

if err == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log this error so we don't lose it.

if filter, err := clusterPkg.GetNamespaceFilter(cluster); err == nil {
compiledFilter, err := regexp.Compile(filter)

if err == nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should log the error.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding onto this - logging the error is great, but the cluster upsert above already succeeded at this point with no indication that the filter is broken. Should the regex be validated higher up the chain so invalid patterns are rejected before this?

Comment thread pkg/cluster/filtering.go Outdated

// No configuration for runtime data means no filter
if config == nil {
return "", errors.New("No runtime data configuration")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really an error? Should it be return nil, nil?

Comment thread pkg/cluster/filtering.go Outdated

// If everything is present, but empty, set no filter
if filter == "" {
return "", errors.New("Empty runtime data configuration")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really an error? Should it be return nil, nil?

@erthalion erthalion force-pushed the feature/per-namespace-persistence branch 2 times, most recently from 6deebde to eda56d2 Compare April 15, 2026 11:32
Allow to configure per-namespace persistence for process indicators, so
that Central wouldn't need to store information, which never will be used.

It could be configured via DynamicConfig of the cluster configuration in
the form:

```
message ProcessIndicators {
  bool no_persistence = 1;
  string exclude_namespace_filter = 2;
  bool exclude_openshift_ns = 3;
}
```

Where `exclude_namespace_filter` allows to specify a custom regex to
filter out processes by matching namespace, `exclude_openshift_ns`
instructs Central to exclude anything from openshift-* namespaces, and
`no_persistence` can be used to disable storing process indicators at all.
@erthalion erthalion force-pushed the feature/per-namespace-persistence branch from eda56d2 to f03bc24 Compare April 15, 2026 12:16
@erthalion erthalion requested a review from dashrews78 April 15, 2026 13:11
@erthalion
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 16, 2026

@erthalion: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ocp-4-21-qa-e2e-tests f03bc24 link false /test ocp-4-21-qa-e2e-tests
ci/prow/ocp-4-12-qa-e2e-tests f03bc24 link false /test ocp-4-12-qa-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

ds.idToNameCache.Add(cluster.GetId(), cluster.GetName())
ds.nameToIDCache.Add(cluster.GetName(), cluster.GetId())

if filter := clusterPkg.GetNamespaceFilter(cluster); filter != nil {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a cluster previously had a namespace filter configured and the user later removes it (all fields empty/false), GetNamespaceFilter would return nil, so we skip this block. But the old compiled regex stays in idToRegexCache, so indicators will keep being filtered even after the filter is disabled. Do we need an else branch here that removes the cluster ID from the idToRegexCache ?

(same pattern in buildCache around line 186)

Comment thread pkg/cluster/filtering.go
return nil
}

return pointers.String(filter)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be anchored with ^ and $? e.g. return pointers.String(fmt.Sprintf("^(?:%s)$", filter)). Currently, a filter like "test-.*" would match namespace "my-test-foo" because the pattern matches within the string.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if we do that we are assuming intent of the filter that we don't control. I think the onus is on the customer to supply the regex they intend to supply. The platform config regexes include the anchors in the regex itself as they should.


match, err := m.clusterDataStore.MatchProcessIndicator(lifecycleMgrCtx, indicator)

if err == nil && match {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick - can we invert this to avoid having an empty if body? Something like

if err != nil {
  log.Errorf("Cannot match indicator %s: %v", indicator.GetId(), err)
}
if err != nil || !match {
  indicatorSlice = append(indicatorSlice, indicator)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants