ROX-31831: optimize ProcessIndicator similarity filter memory by johannes94 · Pull Request #17984 · stackrox/stackrox

johannes94 · 2025-11-27T07:51:02Z

Description

The PI filter is holding the majority of heap memory of central for the biggest CS tenants. Analysis of a heap profile showed that scanPlanBinaryBytesToBytes.Scan is holding most of it, though table scans for the filter are only executed on startup of central. This indicates that we're holding on to entire Process Indicator proto object in memory though that is not necessary for the filter to work.

This PR:

Fixes the reference to entire PI objects, by copying strings before storing them in the filter map
Adds a benchmark test writing a memory profile that proofs the fix reduces memory usage
Adds a benchmark tests that shows performance impact of the copy when the filter is build

Potential Improvement:

In the generated data for the test we've had on avg 27% less total heap with the fix proposed by this PR
For the largest CS tenant the PI filters heap consumption is 70% of centrals baseline heap when no reprocessing is running. This is hold across entire pod lifetime
0.27 * 0.7 = 0.189 so 18.9% improvement for central baseline memory for high PI workload tenants
In reality numbers might vary based on PI data
Side effect: 2% more runtime, 18% more allocs for filter building. This side effect also applies to the filter logic when events come in.

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

How I validated my change

Using the 2 benchmark tests added in this PR
First checkout a commit without the improvement
Run the heap profile benchmark

go tool pprof central/detection/lifecycle/indicator_filter_memory.prof
File: lifecycle.test
Type: inuse_space
Time: 2025-11-26 15:27:44 CET
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 99.03MB, 92.90% of 106.60MB total
Dropped 59 nodes (cum <= 0.53MB)
Showing top 10 nodes out of 61
      flat  flat%   sum%        cum   cum%
   40.01MB 37.54% 37.54%    40.01MB 37.54%  github.com/jackc/pgx/v5/pgtype.scanPlanBinaryBytesToBytes.Scan
   29.51MB 27.68% 65.22%    42.01MB 39.41%  github.com/stackrox/rox/pkg/process/filter.(*filterImpl).siftNoLock
   19.50MB 18.29% 83.51%    19.50MB 18.29%  github.com/stackrox/rox/pkg/process/filter.newLevel (inline)
    3.51MB  3.29% 86.80%     3.51MB  3.29%  runtime.allocm
    3.50MB  3.28% 90.08%    52.51MB 49.26%  github.com/stackrox/rox/pkg/process/filter.(*filterImpl).Add
       1MB  0.94% 91.02%        1MB  0.94%  regexp/syntax.(*compiler).inst
       1MB  0.94% 91.96%        1MB  0.94%  reflect.New
    0.50MB  0.47% 92.43%     1.01MB  0.95%  k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName
    0.50MB  0.47% 92.90%        1MB  0.94%  github.com/stackrox/rox/generated/api/v1.init
         0     0% 92.90%    40.01MB 37.54%  github.com/jackc/pgx/v5.(*baseRows).Scan

The profile clearly shows scanPlanBinaryBytesToBytes.Scan using the majority of heap
Now run it with the fix

go tool pprof central/detection/lifecycle/indicator_filter_memory.prof
File: lifecycle.test
Type: inuse_space
Time: 2025-11-27 07:47:49 CET
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 78.55MB, 92.31% of 85.09MB total
Showing top 10 nodes out of 114
      flat  flat%   sum%        cum   cum%
   41.51MB 48.78% 48.78%    56.51MB 66.41%  github.com/stackrox/rox/pkg/process/filter.(*filterImpl).siftNoLock
   18.50MB 21.74% 70.53%    18.50MB 21.74%  github.com/stackrox/rox/pkg/process/filter.newLevel (inline)
    4.51MB  5.30% 75.82%     4.51MB  5.30%  runtime.allocm
       4MB  4.70% 80.53%        4MB  4.70%  github.com/jackc/pgx/v5/pgtype.scanPlanBinaryBytesToBytes.Scan
       4MB  4.70% 85.23%        4MB  4.70%  strings.Clone (inline)
    2.50MB  2.94% 88.17%    67.01MB 78.75%  github.com/stackrox/rox/pkg/process/filter.(*filterImpl).Add
       1MB  1.18% 89.35%        1MB  1.18%  regexp/syntax.(*compiler).inst
       1MB  1.18% 90.53%        1MB  1.18%  google.golang.org/protobuf/internal/filedesc.(*File).initDecls
       1MB  1.18% 91.70%        1MB  1.18%  runtime.gcBgMarkWorker
    0.52MB  0.61% 92.31%     0.52MB  0.61%  github.com/gogo/protobuf/proto.RegisterType

The profile clearly shows much less memory for scanPlanBinaryBytesToBytes.Scan
Total heap also is reduced, since testdata is generated I used the average of 5 runs to compare

# without copy: 106MB 114MB 116MB 121MB 116MB Avg: 114,6 MB
# with copy: 85MB 97MB 92MB 77MB 72MB Avg: 84.6 MB
# Avg total heap improved by: 27%

Of course copying comes with a negative tax on runtime and allocs:

goos: darwin
goarch: arm64
pkg: github.com/stackrox/rox/central/detection/lifecycle
cpu: Apple M4 Pro
                                   │ without-copy.txt │            copy.txt             │
                                   │      sec/op      │   sec/op     vs base            │
BuildIndicatorFilterPerformance-14        199.0m ± 5%   205.4m ± 2%  ~ (p=0.243 n=9+10)

                                   │ without-copy.txt │               copy.txt                │
                                   │       B/op       │     B/op      vs base                 │
BuildIndicatorFilterPerformance-14       158.6Mi ± 0%   163.9Mi ± 0%  +3.37% (p=0.000 n=9+10)

                                   │ without-copy.txt │               copy.txt                │
                                   │    allocs/op     │  allocs/op   vs base                  │
BuildIndicatorFilterPerformance-14        1.602M ± 0%   1.902M ± 0%  +18.72% (p=0.000 n=9+10)

rhacs-bot · 2025-11-27T08:26:35Z

Images are ready for the commit at 12dfedf.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.10.x-476-g12dfedf8e9.

codecov · 2025-11-27T08:31:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.47%. Comparing base (8aa72ea) to head (cca864c).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #17984   +/-   ##
=======================================
  Coverage   49.47%   49.47%           
=======================================
  Files        2699     2699           
  Lines      198163   198168    +5     
=======================================
+ Hits        98042    98050    +8     
+ Misses      92521    92519    -2     
+ Partials     7600     7599    -1

Flag	Coverage Δ
go-unit-tests	`49.47% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

johannes94 · 2025-12-01T15:27:43Z

Added SensEco and in particular @janisz to the reviewers after discussing team assignment of this PR briefly with @dashrews78

dashrews78

Makes sense to me.

janisz

Nice, I'm wondering if we can totally drop strings and use int64 instead like in

#17040

central/detection/lifecycle/indicator_filter_benchmark_test.go

pkg/process/filter/filter.go

Co-authored-by: Tomasz Janiszewski <tomek@redhat.com>

johannes94 · 2025-12-02T07:43:15Z

Nice, I'm wondering if we can totally drop strings and use int64 instead like in

Could be worth an experiment. I'm not sure how easy that will be, given that we use the actual string to compute "Jaccard Similarity" for them, as opposed to sensor where IIUC we only did hash comparison.
Edit: Nevermind, this was context bloat in my brain Jaccard Similarity is used for pruning not for the filter.

I'd live with this improvement for now, see how it performs in actual deployments of central and based on that make a decision whether even more improvement is necessary.

central/detection/lifecycle/indicator_filter_benchmark_test.go

Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>

vikin91 · 2025-12-02T15:30:51Z

Keeping fingers crossed for this change to improve the situation on Central 🤞

pkg/process/filter/filter.go

Signed-off-by: Tomasz Janiszewski <tomek@redhat.com> Co-authored-by: Tomasz Janiszewski <tomek@redhat.com>

github-actions bot added the area/central label Nov 27, 2025

johannes94 requested a review from a team November 27, 2025 08:24

johannes94 requested review from a team and janisz December 1, 2025 15:25

dashrews78 approved these changes Dec 1, 2025

View reviewed changes

janisz reviewed Dec 1, 2025

View reviewed changes

central/detection/lifecycle/indicator_filter_benchmark_test.go Outdated Show resolved Hide resolved

pkg/process/filter/filter.go Outdated Show resolved Hide resolved

johannes94 and others added 4 commits December 2, 2025 08:35

add benchmark tests for buildIndicatorFilter

fa65435

clone filter strings in PI filter to release PI protos

47784b4

fix style

a2b0974

use b.Loop

cca864c

Co-authored-by: Tomasz Janiszewski <tomek@redhat.com>

johannes94 force-pushed the jmalsam/optimize-pi-filter-memory branch from f3a6175 to cca864c Compare December 2, 2025 07:35

janisz reviewed Dec 2, 2025

View reviewed changes

central/detection/lifecycle/indicator_filter_benchmark_test.go Outdated Show resolved Hide resolved

janisz approved these changes Dec 2, 2025

View reviewed changes

janisz reviewed Dec 2, 2025

View reviewed changes

central/detection/lifecycle/indicator_filter_benchmark_test.go Outdated Show resolved Hide resolved

janisz mentioned this pull request Dec 2, 2025

ROX-31831: use hash instead of string #18015

Merged

janisz and others added 3 commits December 2, 2025 15:47

ROX-31831: use hash instead of string (#18015)

c5cabb8

Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>

addressing PR feedback

90f9835

better explanation on why the change to a hash

12dfedf

janisz approved these changes Dec 2, 2025

View reviewed changes

janisz reviewed Dec 2, 2025

View reviewed changes

pkg/process/filter/filter.go Show resolved Hide resolved

johannes94 merged commit 9751224 into master Dec 3, 2025
91 checks passed

johannes94 deleted the jmalsam/optimize-pi-filter-memory branch December 3, 2025 11:05

ajheflin pushed a commit that referenced this pull request Dec 3, 2025

ROX-31831: optimize ProcessIndicator similarity filter memory (#17984)

4d138a5

Signed-off-by: Tomasz Janiszewski <tomek@redhat.com> Co-authored-by: Tomasz Janiszewski <tomek@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROX-31831: optimize ProcessIndicator similarity filter memory#17984

ROX-31831: optimize ProcessIndicator similarity filter memory#17984
johannes94 merged 7 commits intomasterfrom
jmalsam/optimize-pi-filter-memory

johannes94 commented Nov 27, 2025

Uh oh!

rhacs-bot commented Nov 27, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 27, 2025 •

edited

Loading

Uh oh!

johannes94 commented Dec 1, 2025

Uh oh!

dashrews78 left a comment

Uh oh!

janisz left a comment

Uh oh!

Uh oh!

Uh oh!

johannes94 commented Dec 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

vikin91 commented Dec 2, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

johannes94 commented Nov 27, 2025

Description

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Uh oh!

rhacs-bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

johannes94 commented Dec 1, 2025

Uh oh!

dashrews78 left a comment

Choose a reason for hiding this comment

Uh oh!

janisz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

johannes94 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vikin91 commented Dec 2, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rhacs-bot commented Nov 27, 2025 •

edited

Loading

codecov bot commented Nov 27, 2025 •

edited

Loading

johannes94 commented Dec 2, 2025 •

edited

Loading