ROX-33201: optimize applySingleNoLock to reduce allocations by janisz · Pull Request #19274 · stackrox/stackrox

janisz · 2026-03-03T18:02:29Z

Description

Pre-allocate sets/maps with known capacity to avoid growth reallocations
Move reverseEndpointMap assignment outside loop
Add benchmarks for performance tracking
extract map access out of the loop https://godbolt.org/z/n4YfcYs9j

Heatmap from the current situation:

                                                │ /tmp/benchmark_before.txt │       /tmp/benchmark_final.txt       │
                                                │          sec/op           │    sec/op     vs base                │
Apply/Small_Incremental-8                                       130.1µ ±  6%   114.7µ ±  4%  -11.89% (p=0.000 n=10)
Apply/Medium_Incremental-8                                      1.671m ±  6%   1.397m ±  5%  -16.44% (p=0.000 n=10)
Apply/Large_Incremental-8                                       9.303m ± 12%   7.007m ±  4%  -24.68% (p=0.000 n=10)
Apply/Small_Full-8                                              139.3µ ± 19%   119.2µ ±  9%  -14.42% (p=0.001 n=10)
Apply/Medium_Full-8                                             1.669m ±  8%   1.536m ± 10%   -7.96% (p=0.043 n=10)
Apply/Large_Full-8                                              8.190m ±  4%   7.355m ±  7%  -10.20% (p=0.002 n=10)
ApplySingleNoLock/SingleDeployment_5Endpoints-8                 4.783µ ± 23%   3.682µ ±  6%  -23.03% (p=0.035 n=10)
ApplySingleNoLock/SingleDeployment_20Endpoints-8                22.46µ ±  5%   15.56µ ±  8%  -30.70% (p=0.000 n=10)
ApplySingleNoLock/SingleDeployment_50Endpoints-8                59.48µ ±  6%   39.39µ ±  3%  -33.76% (p=0.000 n=10)
ApplyRepeated-8                                                 2.489m ± 22%   1.528m ± 17%  -38.61% (p=0.000 n=10)
geomean                                                         379.1µ         296.3µ        -21.84%

                                                │ /tmp/benchmark_before.txt │        /tmp/benchmark_final.txt        │
                                                │           B/op            │     B/op      vs base                  │
Apply/Small_Incremental-8                                     50.83Ki ± 0%     50.83Ki ± 0%        ~ (p=1.000 n=10) ¹
Apply/Medium_Incremental-8                                    552.8Ki ± 0%     497.1Ki ± 0%  -10.07% (p=0.000 n=10)
Apply/Large_Incremental-8                                     2.271Mi ± 0%     1.937Mi ± 0%  -14.71% (p=0.000 n=10)
Apply/Small_Full-8                                            50.83Ki ± 0%     50.83Ki ± 0%        ~ (p=1.000 n=10) ¹
Apply/Medium_Full-8                                           552.8Ki ± 0%     497.1Ki ± 0%  -10.07% (p=0.000 n=10)
Apply/Large_Full-8                                            2.271Mi ± 0%     1.937Mi ± 0%  -14.71% (p=0.000 n=10)
ApplySingleNoLock/SingleDeployment_5Endpoints-8                 0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
ApplySingleNoLock/SingleDeployment_20Endpoints-8                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
ApplySingleNoLock/SingleDeployment_50Endpoints-8                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
ApplyRepeated-8                                               514.3Ki ± 0%     458.0Ki ± 0%  -10.94% (p=0.000 n=10)
geomean                                                                    ²                  -6.26%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                                │ /tmp/benchmark_before.txt │       /tmp/benchmark_final.txt       │
                                                │         allocs/op         │  allocs/op   vs base                 │
Apply/Small_Incremental-8                                       504.0 ± 0%      504.0 ± 0%       ~ (p=1.000 n=10) ¹
Apply/Medium_Incremental-8                                     6.776k ± 0%     6.677k ± 0%  -1.46% (p=0.000 n=10)
Apply/Large_Incremental-8                                      38.35k ± 0%     37.76k ± 0%  -1.56% (p=0.000 n=10)
Apply/Small_Full-8                                              504.0 ± 0%      504.0 ± 0%       ~ (p=1.000 n=10) ¹
Apply/Medium_Full-8                                            6.776k ± 0%     6.677k ± 0%  -1.46% (p=0.000 n=10)
Apply/Large_Full-8                                             38.35k ± 0%     37.76k ± 0%  -1.56% (p=0.000 n=10)
ApplySingleNoLock/SingleDeployment_5Endpoints-8                 0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=10) ¹
ApplySingleNoLock/SingleDeployment_20Endpoints-8                0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=10) ¹
ApplySingleNoLock/SingleDeployment_50Endpoints-8                0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=10) ¹
ApplyRepeated-8                                                6.751k ± 0%     6.651k ± 0%  -1.48% (p=0.000 n=10)
geomean                                                                    ²                -0.75%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

How I validated my change

CI + bench

User request: Reduce allocations in applySingleNoLock by applying pre-allocation and moving redundant assignments out of the loop. Changes: - Pre-allocate sets/maps with known capacity to avoid growth reallocations - Move reverseEndpointMap assignment outside loop (was N redundant writes per call) - Add comprehensive benchmarks for performance tracking Benchmark results (10 iterations, benchstat): ApplySingleNoLock/20Endpoints: -39.73% time (22.46µs → 13.54µs) ApplySingleNoLock/50Endpoints: -37.31% time (59.48µs → 37.29µs) ApplyRepeated (realistic usage): -40.45% time (2.489ms → 1.482ms) Memory improvements: Medium workloads: -5.09% bytes/op (552.8Ki → 524.7Ki), -0.74% allocs/op Large workloads: -7.39% bytes/op (2.271Mi → 2.103Mi), -0.78% allocs/op Note: Code partially generated with AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Remove unconditional etiSet allocation on every loop iteration. Only allocate when actually needed based on map state. Changes: - Move etiSet allocation into conditional branches - Retrieve etiSet from map after ensuring it exists - Clearer logic flow with explicit cases Performance improvement: - Medium workloads: -5% memory (566KB → 537KB), -50 allocations - Large workloads: -7.4% memory (2.38MB → 2.20MB), -300 allocations Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Pre-allocate historical maps with known capacity from current maps to avoid growth reallocations. Changes: - Pre-allocate historicalEndpoints[ep][deploymentID] with size from endpointMap - Pre-allocate reverseHistoricalEndpoints[deploymentID] with size from reverseEndpointMap This reduces allocations when moving endpoints to history during reset operations or deployment changes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The benchmark helper loops like for d := range numDeployments / for e := range numEndpoints / for t := range targetsPerEp won't compile in Go; these should be standard counted loops (e.g. for d := 0; d < numDeployments; d++ { ... }).
The benchmark bodies use for b.Loop() which is not part of testing.B; replace this with the usual for i := 0; i < b.N; i++ { ... } (and consider using b.ResetTimer()/b.ReportAllocs() if you want cleaner measurements).

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The benchmark helper loops like `for d := range numDeployments` / `for e := range numEndpoints` / `for t := range targetsPerEp` won't compile in Go; these should be standard counted loops (e.g. `for d := 0; d < numDeployments; d++ { ... }`).
- The benchmark bodies use `for b.Loop()` which is not part of `testing.B`; replace this with the usual `for i := 0; i < b.N; i++ { ... }` (and consider using `b.ResetTimer()`/`b.ReportAllocs()` if you want cleaner measurements).

## Individual Comments

### Comment 1
<location path="sensor/common/clusterentities/store_endpoints_bench_test.go" line_range="26-30" />
<code_context>
+	}
+
+	for _, bm := range benchmarks {
+		b.Run(bm.name, func(b *testing.B) {
+			// Create realistic test data
+			updates := generateTestUpdates(bm.numDeployments, bm.endpointsPerDep, bm.targetsPerEp)
+
+			for b.Loop() {
+				store := newEndpointsStoreWithMemory(100)
+				store.Apply(updates, bm.incremental)
</code_context>
<issue_to_address>
**issue (bug_risk):** Benchmarks use `b.Loop()` and `range` over ints, which will not compile and/or behave as intended

These benchmarks currently use `for b.Loop()` and `range` over `int` values (e.g. `for d := range numDeployments`), which is invalid Go and will prevent them from compiling/running.

Please switch to the standard benchmarking pattern and counting loops, e.g.:

```go
for i := 0; i < b.N; i++ {
    store := newEndpointsStoreWithMemory(100)
    store.Apply(updates, bm.incremental)
}

for d := 0; d < numDeployments; d++ { … }
for e := 0; e < numEndpoints; e++ { … }
for t := 0; t < targetsPerEp; t++ { … }
```

If you want to track allocation changes, also add `b.ReportAllocs()` to the relevant benchmarks.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sensor/common/clusterentities/store_endpoints_bench_test.go

rhacs-bot · 2026-03-03T18:40:55Z

Images are ready for the commit at d7f678d.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-171-gd7f678deb7.

janisz · 2026-03-04T09:45:27Z

/retest

openshift-ci · 2026-03-04T10:36:46Z

@janisz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/gke-upgrade-tests	`d7f678d`	link	false	`/test gke-upgrade-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

vikin91 · 2026-03-04T13:36:56Z

sensor/common/clusterentities/store_endpoints.go

+			reverseMap := e.reverseHistoricalEndpoints[deploymentID]
+			reverseMap[endpoint].recordTick()


We introduced a var reverseMap to use it only once. Is that helping in performance in anyway?

vikin91

The change is good, but there are few subtle logic changes in the code. I will need some more time to go over it, but I will post the details soon.

Edit: I added the comment about the changed logic. PTAL and let me know.
I want to thread carefully in this code as it was the probable reason for several customer tickets in the past.

vikin91 · 2026-03-04T15:33:32Z

sensor/common/clusterentities/store_endpoints.go

+		dSet = make(set.Set[net.NumericEndpoint], len(data.endpoints))
+		e.reverseEndpointMap[deploymentID] = dSet


This is the logic change that I meant.

In the old code, if data.endpoints is empty, the loop body never executes and reverseEndpointMap[deploymentID] is never written. In the new code, an empty set is always stored.
This matters because isDeleteOnly() checks more than just endpoints:

func (ed *EntityData) isDeleteOnly() bool { if ed == nil { return true } if len(ed.endpoints)+len(ed.containerIDs)+len(ed.ips) == 0 { return true

So applySingleNoLock can be called with len(data.endpoints) == 0 when containerIDs or ips are non-empty. In that scenario:

The new code creates an empty entry reverseEndpointMap["depl1"] = {} (empty set)

On a subsequent call for the same deployment that now does have endpoints, deploymentFound will be true

This means the else if !deploymentFound branch (the endpoint takeover / moveToHistory logic) will be skipped

In the old code, step 2 would have deploymentFound = false, and the takeover logic would correctly execute.

I estimate the impact as low in normal operation (deployments almost always arrive with endpoints), but it's a real behavioral difference that could cause missed history entries in edge cases.

What to do with that? There are few options:

I would love to see a unit test for this. I am not sure about the current quality and coverage of the uint tests for that (even though I myself wrote or updated them most probably).

Unless we confirm this is safe, I would suggest to restore the previous behavior.

We can change the code of isDeleteOnly (and maybe other functions that rely on this) to handle the cases produced by the new code correctly.

WDYT?

vikin91 · 2026-03-04T15:35:11Z

sensor/common/clusterentities/store_endpoints_bench_test.go

+			updates := generateTestUpdates(bm.numDeployments, bm.endpointsPerDep, bm.targetsPerEp)
+
+			for b.Loop() {
+				store := newEndpointsStoreWithMemory(100)


We usually set this to something between 5 and 20. 1 unit is currently mapped to one tick that is 30s by default.

janisz and others added 4 commits March 3, 2026 17:46

extract map

d7f678d

Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>

janisz requested a review from a team as a code owner March 3, 2026 18:02

github-actions bot added area/sensor ai-review labels Mar 3, 2026

sourcery-ai bot reviewed Mar 3, 2026

View reviewed changes

sensor/common/clusterentities/store_endpoints_bench_test.go Show resolved Hide resolved

vikin91 self-requested a review March 4, 2026 13:15

vikin91 reviewed Mar 4, 2026

View reviewed changes

vikin91 requested changes Mar 4, 2026

View reviewed changes

vikin91 reviewed Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROX-33201: optimize applySingleNoLock to reduce allocations#19274

ROX-33201: optimize applySingleNoLock to reduce allocations#19274
janisz wants to merge 4 commits intomasterfrom
ROX-33201/store_endpoints

janisz commented Mar 3, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

rhacs-bot commented Mar 3, 2026

Uh oh!

janisz commented Mar 4, 2026

Uh oh!

openshift-ci bot commented Mar 4, 2026

Uh oh!

vikin91 Mar 4, 2026

Uh oh!

vikin91 left a comment •

edited

Loading

Uh oh!

vikin91 Mar 4, 2026

Uh oh!

vikin91 Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		reverseMap := e.reverseHistoricalEndpoints[deploymentID]
		reverseMap[endpoint].recordTick()

		dSet = make(set.Set[net.NumericEndpoint], len(data.endpoints))
		e.reverseEndpointMap[deploymentID] = dSet

Conversation

janisz commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rhacs-bot commented Mar 3, 2026

Uh oh!

janisz commented Mar 4, 2026

Uh oh!

openshift-ci bot commented Mar 4, 2026

Uh oh!

vikin91 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

vikin91 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vikin91 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

vikin91 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

janisz commented Mar 3, 2026 •

edited

Loading

vikin91 left a comment •

edited

Loading