chore(scale): randomize node IPs and ports in fake workloads#19610
chore(scale): randomize node IPs and ports in fake workloads#19610
Conversation
Previously all fake nodes shared a single hardcoded IP pair (10.138.28.6 / 35.185.217.58) and node ports were random (possibly colliding). This caused unrealistic cardinality in endpoint tracking and masked issues that only surface with many distinct IPs. Generate per-node IPs from the node index so every fake node gets a unique address, and assign node ports from a monotonically incrementing counter to avoid collisions. Partially generated by AI. Made-with: Cursor
|
This change is part of the following stack: Change managed by git-spice. |
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The global
nextNodePortcounter is not concurrency-safe; if fake services are created from multiple goroutines, consider usingatomic.AddInt32or async.Mutexto avoid races and duplicate ports. - The
getUniqueNodePortwrap-around at 32767 means ports will be reused once ~2.7k nodePorts are allocated; if scale tests can exceed this, consider increasing the range or tying the counter to the workload size to preserve uniqueness within a run.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The global `nextNodePort` counter is not concurrency-safe; if fake services are created from multiple goroutines, consider using `atomic.AddInt32` or a `sync.Mutex` to avoid races and duplicate ports.
- The `getUniqueNodePort` wrap-around at 32767 means ports will be reused once ~2.7k nodePorts are allocated; if scale tests can exceed this, consider increasing the range or tying the counter to the workload size to preserve uniqueness within a run.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19610 +/- ##
==========================================
+ Coverage 49.25% 49.28% +0.02%
==========================================
Files 2735 2735
Lines 206138 206222 +84
==========================================
+ Hits 101539 101634 +95
+ Misses 97051 97047 -4
+ Partials 7548 7541 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thanks for the suggestion. I checked the call graph —
Good callout on the range. The Kubernetes NodePort range is fixed at 30000–32767 |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The global
nextNodePortcounter is mutated without synchronization; if fake services are ever generated from multiple goroutines this will race and produce incorrect or duplicated ports, so consider guarding it with a mutex/atomic or making the allocator instance-scoped. - The IP generation logic using
fmt.Sprintf("10.%d.%d.%d", 128+(i/(256*256))%128, (i/256)%256, i%256)is a bit opaque; extracting this into a small helper with a descriptive name or comment about the intended address space and max supported node count would improve readability and future modifications.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The global `nextNodePort` counter is mutated without synchronization; if fake services are ever generated from multiple goroutines this will race and produce incorrect or duplicated ports, so consider guarding it with a mutex/atomic or making the allocator instance-scoped.
- The IP generation logic using `fmt.Sprintf("10.%d.%d.%d", 128+(i/(256*256))%128, (i/256)%256, i%256)` is a bit opaque; extracting this into a small helper with a descriptive name or comment about the intended address space and max supported node count would improve readability and future modifications.
## Individual Comments
### Comment 1
<location path="sensor/kubernetes/fake/service.go" line_range="15-18" />
<code_context>
- ipFamilies = [...]string{"IPv4", "IPv6"}
+ protocols = [...]string{"TCP", "UDP", "SCTP"}
+ ipFamilies = [...]string{"IPv4", "IPv6"}
+ nextNodePort int32 = 30000
)
</code_context>
<issue_to_address>
**issue (bug_risk):** Global mutable nodePort counter is not concurrency-safe and may race under parallel usage.
If these helpers are called from multiple goroutines (common in concurrent tests), `nextNodePort` will race and may produce duplicate or invalid ports. Please synchronize access (e.g., with `atomic.AddInt32` or a mutex), or make the counter instance-scoped on a struct to keep concurrency behavior well-defined.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| nextNodePort int32 = 30000 | ||
| ) | ||
|
|
||
| func getRandProtocol() string { |
There was a problem hiding this comment.
issue (bug_risk): Global mutable nodePort counter is not concurrency-safe and may race under parallel usage.
If these helpers are called from multiple goroutines (common in concurrent tests), nextNodePort will race and may produce duplicate or invalid ports. Please synchronize access (e.g., with atomic.AddInt32 or a mutex), or make the counter instance-scoped on a struct to keep concurrency behavior well-defined.
|
Images are ready for the commit at f1f8116. To use with deploy scripts, first |
Description
Fake workload nodes previously shared a single hardcoded IP pair and generated
random (possibly colliding) node ports. This caused unrealistic cardinality in
endpoint tracking during scale testing.
This was a required change to reproduce a real customer issue: ROX-33474.
Changes:
internal and external address.
wrapping) to avoid collisions in typical workload sizes.
User-facing documentation
Testing and quality
Automated testing
How I validated my change