Skip to content

ROX-33201: optimize applySingleNoLock to reduce allocations#19274

Open
janisz wants to merge 4 commits intomasterfrom
ROX-33201/store_endpoints
Open

ROX-33201: optimize applySingleNoLock to reduce allocations#19274
janisz wants to merge 4 commits intomasterfrom
ROX-33201/store_endpoints

Conversation

@janisz
Copy link
Contributor

@janisz janisz commented Mar 3, 2026

Description

  • Pre-allocate sets/maps with known capacity to avoid growth reallocations
  • Move reverseEndpointMap assignment outside loop
  • Add benchmarks for performance tracking
  • extract map access out of the loop https://godbolt.org/z/n4YfcYs9j

Heatmap from the current situation:
image

                                                │ /tmp/benchmark_before.txt │       /tmp/benchmark_final.txt       │
                                                │          sec/op           │    sec/op     vs base                │
Apply/Small_Incremental-8                                       130.1µ ±  6%   114.7µ ±  4%  -11.89% (p=0.000 n=10)
Apply/Medium_Incremental-8                                      1.671m ±  6%   1.397m ±  5%  -16.44% (p=0.000 n=10)
Apply/Large_Incremental-8                                       9.303m ± 12%   7.007m ±  4%  -24.68% (p=0.000 n=10)
Apply/Small_Full-8                                              139.3µ ± 19%   119.2µ ±  9%  -14.42% (p=0.001 n=10)
Apply/Medium_Full-8                                             1.669m ±  8%   1.536m ± 10%   -7.96% (p=0.043 n=10)
Apply/Large_Full-8                                              8.190m ±  4%   7.355m ±  7%  -10.20% (p=0.002 n=10)
ApplySingleNoLock/SingleDeployment_5Endpoints-8                 4.783µ ± 23%   3.682µ ±  6%  -23.03% (p=0.035 n=10)
ApplySingleNoLock/SingleDeployment_20Endpoints-8                22.46µ ±  5%   15.56µ ±  8%  -30.70% (p=0.000 n=10)
ApplySingleNoLock/SingleDeployment_50Endpoints-8                59.48µ ±  6%   39.39µ ±  3%  -33.76% (p=0.000 n=10)
ApplyRepeated-8                                                 2.489m ± 22%   1.528m ± 17%  -38.61% (p=0.000 n=10)
geomean                                                         379.1µ         296.3µ        -21.84%

                                                │ /tmp/benchmark_before.txt │        /tmp/benchmark_final.txt        │
                                                │           B/op            │     B/op      vs base                  │
Apply/Small_Incremental-8                                     50.83Ki ± 0%     50.83Ki ± 0%        ~ (p=1.000 n=10) ¹
Apply/Medium_Incremental-8                                    552.8Ki ± 0%     497.1Ki ± 0%  -10.07% (p=0.000 n=10)
Apply/Large_Incremental-8                                     2.271Mi ± 0%     1.937Mi ± 0%  -14.71% (p=0.000 n=10)
Apply/Small_Full-8                                            50.83Ki ± 0%     50.83Ki ± 0%        ~ (p=1.000 n=10) ¹
Apply/Medium_Full-8                                           552.8Ki ± 0%     497.1Ki ± 0%  -10.07% (p=0.000 n=10)
Apply/Large_Full-8                                            2.271Mi ± 0%     1.937Mi ± 0%  -14.71% (p=0.000 n=10)
ApplySingleNoLock/SingleDeployment_5Endpoints-8                 0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
ApplySingleNoLock/SingleDeployment_20Endpoints-8                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
ApplySingleNoLock/SingleDeployment_50Endpoints-8                0.000 ± 0%       0.000 ± 0%        ~ (p=1.000 n=10) ¹
ApplyRepeated-8                                               514.3Ki ± 0%     458.0Ki ± 0%  -10.94% (p=0.000 n=10)
geomean                                                                    ²                  -6.26%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                                │ /tmp/benchmark_before.txt │       /tmp/benchmark_final.txt       │
                                                │         allocs/op         │  allocs/op   vs base                 │
Apply/Small_Incremental-8                                       504.0 ± 0%      504.0 ± 0%       ~ (p=1.000 n=10) ¹
Apply/Medium_Incremental-8                                     6.776k ± 0%     6.677k ± 0%  -1.46% (p=0.000 n=10)
Apply/Large_Incremental-8                                      38.35k ± 0%     37.76k ± 0%  -1.56% (p=0.000 n=10)
Apply/Small_Full-8                                              504.0 ± 0%      504.0 ± 0%       ~ (p=1.000 n=10) ¹
Apply/Medium_Full-8                                            6.776k ± 0%     6.677k ± 0%  -1.46% (p=0.000 n=10)
Apply/Large_Full-8                                             38.35k ± 0%     37.76k ± 0%  -1.56% (p=0.000 n=10)
ApplySingleNoLock/SingleDeployment_5Endpoints-8                 0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=10) ¹
ApplySingleNoLock/SingleDeployment_20Endpoints-8                0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=10) ¹
ApplySingleNoLock/SingleDeployment_50Endpoints-8                0.000 ± 0%      0.000 ± 0%       ~ (p=1.000 n=10) ¹
ApplyRepeated-8                                                6.751k ± 0%     6.651k ± 0%  -1.48% (p=0.000 n=10)
geomean                                                                    ²                -0.75%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

CI + bench

janisz and others added 4 commits March 3, 2026 17:46
User request: Reduce allocations in applySingleNoLock by applying pre-allocation
and moving redundant assignments out of the loop.

Changes:
- Pre-allocate sets/maps with known capacity to avoid growth reallocations
- Move reverseEndpointMap assignment outside loop (was N redundant writes per call)
- Add comprehensive benchmarks for performance tracking

Benchmark results (10 iterations, benchstat):
ApplySingleNoLock/20Endpoints:  -39.73% time (22.46µs → 13.54µs)
ApplySingleNoLock/50Endpoints:  -37.31% time (59.48µs → 37.29µs)
ApplyRepeated (realistic usage): -40.45% time (2.489ms → 1.482ms)

Memory improvements:
Medium workloads: -5.09% bytes/op (552.8Ki → 524.7Ki), -0.74% allocs/op
Large workloads:  -7.39% bytes/op (2.271Mi → 2.103Mi), -0.78% allocs/op

Note: Code partially generated with AI assistance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove unconditional etiSet allocation on every loop iteration.
Only allocate when actually needed based on map state.

Changes:
- Move etiSet allocation into conditional branches
- Retrieve etiSet from map after ensuring it exists
- Clearer logic flow with explicit cases

Performance improvement:
- Medium workloads: -5% memory (566KB → 537KB), -50 allocations
- Large workloads: -7.4% memory (2.38MB → 2.20MB), -300 allocations

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Pre-allocate historical maps with known capacity from current maps
to avoid growth reallocations.

Changes:
- Pre-allocate historicalEndpoints[ep][deploymentID] with size from endpointMap
- Pre-allocate reverseHistoricalEndpoints[deploymentID] with size from reverseEndpointMap

This reduces allocations when moving endpoints to history during
reset operations or deployment changes.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Tomasz Janiszewski <tomek@redhat.com>
@janisz janisz requested a review from a team as a code owner March 3, 2026 18:02
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The benchmark helper loops like for d := range numDeployments / for e := range numEndpoints / for t := range targetsPerEp won't compile in Go; these should be standard counted loops (e.g. for d := 0; d < numDeployments; d++ { ... }).
  • The benchmark bodies use for b.Loop() which is not part of testing.B; replace this with the usual for i := 0; i < b.N; i++ { ... } (and consider using b.ResetTimer()/b.ReportAllocs() if you want cleaner measurements).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The benchmark helper loops like `for d := range numDeployments` / `for e := range numEndpoints` / `for t := range targetsPerEp` won't compile in Go; these should be standard counted loops (e.g. `for d := 0; d < numDeployments; d++ { ... }`).
- The benchmark bodies use `for b.Loop()` which is not part of `testing.B`; replace this with the usual `for i := 0; i < b.N; i++ { ... }` (and consider using `b.ResetTimer()`/`b.ReportAllocs()` if you want cleaner measurements).

## Individual Comments

### Comment 1
<location path="sensor/common/clusterentities/store_endpoints_bench_test.go" line_range="26-30" />
<code_context>
+	}
+
+	for _, bm := range benchmarks {
+		b.Run(bm.name, func(b *testing.B) {
+			// Create realistic test data
+			updates := generateTestUpdates(bm.numDeployments, bm.endpointsPerDep, bm.targetsPerEp)
+
+			for b.Loop() {
+				store := newEndpointsStoreWithMemory(100)
+				store.Apply(updates, bm.incremental)
</code_context>
<issue_to_address>
**issue (bug_risk):** Benchmarks use `b.Loop()` and `range` over ints, which will not compile and/or behave as intended

These benchmarks currently use `for b.Loop()` and `range` over `int` values (e.g. `for d := range numDeployments`), which is invalid Go and will prevent them from compiling/running.

Please switch to the standard benchmarking pattern and counting loops, e.g.:

```go
for i := 0; i < b.N; i++ {
    store := newEndpointsStoreWithMemory(100)
    store.Apply(updates, bm.incremental)
}

for d := 0; d < numDeployments; d++ { … }
for e := 0; e < numEndpoints; e++ { … }
for t := 0; t < targetsPerEp; t++ { … }
```

If you want to track allocation changes, also add `b.ReportAllocs()` to the relevant benchmarks.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Contributor

Images are ready for the commit at d7f678d.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-171-gd7f678deb7.

@janisz
Copy link
Contributor Author

janisz commented Mar 4, 2026

/retest

@openshift-ci
Copy link

openshift-ci bot commented Mar 4, 2026

@janisz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-upgrade-tests d7f678d link false /test gke-upgrade-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@vikin91 vikin91 self-requested a review March 4, 2026 13:15
Comment on lines +89 to +90
reverseMap := e.reverseHistoricalEndpoints[deploymentID]
reverseMap[endpoint].recordTick()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We introduced a var reverseMap to use it only once. Is that helping in performance in anyway?

Copy link
Contributor

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is good, but there are few subtle logic changes in the code. I will need some more time to go over it, but I will post the details soon.

Edit: I added the comment about the changed logic. PTAL and let me know.
I want to thread carefully in this code as it was the probable reason for several customer tickets in the past.

Comment on lines +137 to +138
dSet = make(set.Set[net.NumericEndpoint], len(data.endpoints))
e.reverseEndpointMap[deploymentID] = dSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the logic change that I meant.

In the old code, if data.endpoints is empty, the loop body never executes and reverseEndpointMap[deploymentID] is never written. In the new code, an empty set is always stored.
This matters because isDeleteOnly() checks more than just endpoints:

func (ed *EntityData) isDeleteOnly() bool {
	if ed == nil {
		return true
	}
	if len(ed.endpoints)+len(ed.containerIDs)+len(ed.ips) == 0 {
		return true

So applySingleNoLock can be called with len(data.endpoints) == 0 when containerIDs or ips are non-empty. In that scenario:

  1. The new code creates an empty entry reverseEndpointMap["depl1"] = {} (empty set)
  2. On a subsequent call for the same deployment that now does have endpoints, deploymentFound will be true
  3. This means the else if !deploymentFound branch (the endpoint takeover / moveToHistory logic) will be skipped

In the old code, step 2 would have deploymentFound = false, and the takeover logic would correctly execute.

I estimate the impact as low in normal operation (deployments almost always arrive with endpoints), but it's a real behavioral difference that could cause missed history entries in edge cases.

What to do with that? There are few options:

  1. I would love to see a unit test for this. I am not sure about the current quality and coverage of the uint tests for that (even though I myself wrote or updated them most probably).
  2. Unless we confirm this is safe, I would suggest to restore the previous behavior.
  3. We can change the code of isDeleteOnly (and maybe other functions that rely on this) to handle the cases produced by the new code correctly.

WDYT?

updates := generateTestUpdates(bm.numDeployments, bm.endpointsPerDep, bm.targetsPerEp)

for b.Loop() {
store := newEndpointsStoreWithMemory(100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually set this to something between 5 and 20. 1 unit is currently mapped to one tick that is 30s by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants