ROX-30860: Limit parallel port use by processes #16628
Conversation
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at ddb5b45. To use with deploy scripts, first |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16628 +/- ##
==========================================
+ Coverage 48.67% 49.03% +0.36%
==========================================
Files 2675 2691 +16
Lines 199760 201669 +1909
==========================================
+ Hits 97235 98897 +1662
- Misses 94918 95103 +185
- Partials 7607 7669 +62
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This change is part of the following stack: Change managed by git-spice. |
01545b1 to
f1977c0
Compare
67995be to
4a77577
Compare
502b8cf to
6c08ff8
Compare
4a77577 to
da9feb4
Compare
6c08ff8 to
b75539a
Compare
da9feb4 to
6a3376f
Compare
b75539a to
7e33a34
Compare
6a3376f to
d0c8150
Compare
d0c8150 to
6a81807
Compare
be53969 to
72a20e4
Compare
There was a problem hiding this comment.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `sensor/kubernetes/fake/flows.go:59` </location>
<code_context>
+// <container1, 1.1.1.1:80, apache2>, then Sensor will keep the nginx-entry forever, as there was no 'close' message in between.
+//
+// The probability logic is explicit and configurable for different testing scenarios.
+func (oc *OriginatorCache) GetOrSetOriginator(endpointKey string, containerID string, openPortReuseProbability float32, processPool *ProcessPool) *storage.NetworkProcessUniqueKey {
+ // Use panic-safe read lock to check cache
+ originator, exists := concurrency.WithRLock2(&oc.lock, func() (*storage.NetworkProcessUniqueKey, bool) {
</code_context>
<issue_to_address>
Consider validating openPortReuseProbability input range.
Without an explicit range check, passing values outside [0.0, 1.0] could cause incorrect probability calculations. Please add validation or clamp the input to avoid such issues.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
I still haven't done the long run on a cluster, but I will trigger it tomorrow. Edit: running on |
|
/retest |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
/retest-required |
|
@JoukoVirtanen PTAL again, I have observations from the long-running cluster that workload generation works fine. |
|
/retest-required |
|
/retest |
Description
Introduced configurable port reuse probability to limit unrealistic endpoint sharing behavior in fake workload generation.
Key Changes:
OriginatorCache: Implements probabilistic endpoint-to-originator caching with configurable reuse probabilityopenPortReuseProbabilityfield: Configurable parameter (default 0.05) controlling how often different processes reuse the same IP:port without closing endpointsWhy: Previous behavior (100% port reuse in releases ≤4.8) was unrealistic and caused memory pressure in Sensor's enrichment pipeline. Multiple processes listening on the same IP:port without proper endpoint closure created excessive deduplication overhead, as Sensor would retain stale entries indefinitely when no close messages were sent between different originators on the same endpoint.
The new approach simulates realistic container behavior where processes typically bind consistently to endpoints (95% default) while allowing occasional port reuse scenarios (5% for restarts/takeovers).
Impact:
User-facing documentation
Testing and quality
Automated testing
How I validated my change
The workload runs fine on the long running cluster.
Sensor is crashing every 12 hours on 8GB of memory (due to OOMKill), which is a known pattern, but that is not caused by this PR, but by the: (1) bug in the memory management that is currently on master, (2) giving Sensor only 8GB of memory for the experiment.
The following chart confirms the rates of objects being generated by fake workflows

This chart shows that the processesListening are being processed and sent to Central
