chore(scale): add online/offline transition cycle by vikin91 · Pull Request #19611 · stackrox/stackrox

vikin91 · 2026-03-25T16:22:23Z

Description

Add a centralConnectionCrashCycle duration field to fake workload profiles
that periodically triggers synthetic Sensor offline->online state transitions.
This exercises the same state-handling paths that real Central connection drops
would, without requiring actual network disruptions.

This was a required change to reproduce a real customer issue: ROX-33474.

Key changes:

Extract createFakeCentralService from setupCentralWithFakeConnection
for clarity and reuse.
Add TriggerOnlineOfflineTransition on Sensor to drive state changes
from outside the gRPC stream path.
Refactor internal-message restart handler into the new public
RequestCentralConnectionRestart method.
Wire a cycle timer in local-sensor main loop.
Include high-node-count.yaml workload profile that uses the feature.

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

How I validated my change

Scale test tooling change only. Ran local-sensor with high-node-count.yaml
profile and observed periodic offline/online state transitions in logs.

vikin91 · 2026-03-25T16:22:26Z

This change is part of the following stack:

chore(scale): add online/offline transition cycle #19611 ◀

_{Change managed by git-spice.}

openshift-ci · 2026-03-25T16:22:27Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

In main, removing the generic spyCentral.SetMessageRecording(!localConfig.SkipCentralOutput) and moving that call into createFakeCentralService changes behavior for any non-fake Central connection where spyCentral is non-nil; double-check whether this was intentional or if real connections should still honor SkipCentralOutput.
In validateWorkload, a negative centralConnectionCrashCycle is both clamped to 0 and returned as an error; if negative values are expected to be auto-corrected, consider returning a warning-level signal (or logging) instead of an error to avoid treating this as a hard validation failure.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `main`, removing the generic `spyCentral.SetMessageRecording(!localConfig.SkipCentralOutput)` and moving that call into `createFakeCentralService` changes behavior for any non-fake Central connection where `spyCentral` is non-nil; double-check whether this was intentional or if real connections should still honor `SkipCentralOutput`.
- In `validateWorkload`, a negative `centralConnectionCrashCycle` is both clamped to 0 and returned as an error; if negative values are expected to be auto-corrected, consider returning a warning-level signal (or logging) instead of an error to avoid treating this as a hard validation failure.

## Individual Comments

### Comment 1
<location path="tools/local-sensor/main.go" line_range="522" />
<code_context>
 	return fakeConnectionFactory, centralclient.EmptyCertLoader(), spyCentral
 }
+
+func startCentralCrashCycle(ctx context.Context, cycle time.Duration, sensor *commonSensor.Sensor) {
+	if cycle <= 0 || sensor == nil {
+		return
</code_context>
<issue_to_address>
**issue (bug_risk):** Consider using `RequestCentralConnectionRestart` instead of a synthetic state transition for a "connection crash cycle"

The current implementation only toggles component state via `TriggerOnlineOfflineTransition`, while the naming (`startCentralCrashCycle` / `CentralConnectionCrashCycle`) implies exercising actual connection restart/teardown behavior. If you intend to test connection retry under load, this should likely call `RequestCentralConnectionRestart`; if you only need state transitions, consider renaming to better reflect that behavior and avoid confusion for future readers.
</issue_to_address>

### Comment 2
<location path="sensor/kubernetes/fake/fake.go" line_range="254" />
<code_context>
 	return w.client
 }

+// CentralConnectionCrashCycle returns how often local-sensor should crash fake Central.
+func (w *WorkloadManager) CentralConnectionCrashCycle() time.Duration {
+	if w == nil || w.workload == nil {
</code_context>
<issue_to_address>
**issue:** Comment says "crash fake Central" but lifecycle implementation only toggles sensor state

The docstring for this getter implies the fake Central server/connection is being crashed and restarted, but `startCentralCrashCycle` only calls `Sensor.TriggerOnlineOfflineTransition`, which just flips the sensor’s online/offline state. Please either update the comment to describe the current behavior or adjust the implementation so it actually controls fake Central crashes, so the two stay aligned.
</issue_to_address>

### Comment 3
<location path="sensor/kubernetes/fake/fake.go" line_range="318-320" />
<code_context>
 }

 func validateWorkload(workload *Workload) error {
+	if workload.CentralConnectionCrashCycle < 0 {
+		workload.CentralConnectionCrashCycle = 0
+		return errors.New("negative centralConnectionCrashCycle, clamped to 0")
+	}
 	if workload.NetworkWorkload.OpenPortReuseProbability < 0.0 || workload.NetworkWorkload.OpenPortReuseProbability > 1.0 {
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Reconsider returning an error when the crash cycle is negative but successfully clamped

Here we correct a negative value to 0, but still return an error. That means callers may see failures or noisy logs even though the input has been made safe. If this is intended as a recoverable misconfiguration (like the open-port reuse probability case), consider either logging a warning and returning nil, or returning a non-fatal/wrapped error that callers can deliberately ignore. Clarifying whether this should be fatal vs advisory will make `validateWorkload`’s behavior more predictable for callers.

Suggested implementation:

```golang
func validateWorkload(workload *Workload) error {
	if workload.CentralConnectionCrashCycle < 0 {
		// Clamp to 0 but treat as recoverable misconfiguration: log advisory, do not fail validation.
		workload.CentralConnectionCrashCycle = 0
		log.Warnf("negative centralConnectionCrashCycle in workload; clamped to 0")
	}
	if workload.NetworkWorkload.OpenPortReuseProbability < 0.0 || workload.NetworkWorkload.OpenPortReuseProbability > 1.0 {

```

1. Ensure the logger used here (`log.Warnf`) matches the logging package already used in this file/package (e.g. `logrus`, `logging.LoggerForModule`, or Go's stdlib `log`). If this file does not yet have a logger, import and initialize it according to the project's conventions.
2. If `errors` was only used for this now-removed return value, remove the `errors` import to avoid an unused import error.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-25T16:24:34Z

tools/local-sensor/main.go

 	return fakeConnectionFactory, centralclient.EmptyCertLoader(), spyCentral
 }
+
+func startCentralCrashCycle(ctx context.Context, cycle time.Duration, sensor *commonSensor.Sensor) {


issue (bug_risk): Consider using RequestCentralConnectionRestart instead of a synthetic state transition for a "connection crash cycle"

The current implementation only toggles component state via TriggerOnlineOfflineTransition, while the naming (startCentralCrashCycle / CentralConnectionCrashCycle) implies exercising actual connection restart/teardown behavior. If you intend to test connection retry under load, this should likely call RequestCentralConnectionRestart; if you only need state transitions, consider renaming to better reflect that behavior and avoid confusion for future readers.

sourcery-ai · 2026-03-25T16:24:34Z

sensor/kubernetes/fake/fake.go

 	return w.client
 }

+// CentralConnectionCrashCycle returns how often local-sensor should crash fake Central.


issue: Comment says "crash fake Central" but lifecycle implementation only toggles sensor state

The docstring for this getter implies the fake Central server/connection is being crashed and restarted, but startCentralCrashCycle only calls Sensor.TriggerOnlineOfflineTransition, which just flips the sensor’s online/offline state. Please either update the comment to describe the current behavior or adjust the implementation so it actually controls fake Central crashes, so the two stay aligned.

sourcery-ai · 2026-03-25T16:24:34Z

sensor/kubernetes/fake/fake.go

+	if workload.CentralConnectionCrashCycle < 0 {
+		workload.CentralConnectionCrashCycle = 0
+		return errors.New("negative centralConnectionCrashCycle, clamped to 0")


suggestion (bug_risk): Reconsider returning an error when the crash cycle is negative but successfully clamped

Here we correct a negative value to 0, but still return an error. That means callers may see failures or noisy logs even though the input has been made safe. If this is intended as a recoverable misconfiguration (like the open-port reuse probability case), consider either logging a warning and returning nil, or returning a non-fatal/wrapped error that callers can deliberately ignore. Clarifying whether this should be fatal vs advisory will make validateWorkload’s behavior more predictable for callers.

Suggested implementation:

func validateWorkload(workload *Workload) error { if workload.CentralConnectionCrashCycle < 0 { // Clamp to 0 but treat as recoverable misconfiguration: log advisory, do not fail validation. workload.CentralConnectionCrashCycle = 0 log.Warnf("negative centralConnectionCrashCycle in workload; clamped to 0") } if workload.NetworkWorkload.OpenPortReuseProbability < 0.0 || workload.NetworkWorkload.OpenPortReuseProbability > 1.0 {

Ensure the logger used here (log.Warnf) matches the logging package already used in this file/package (e.g. logrus, logging.LoggerForModule, or Go's stdlib log). If this file does not yet have a logger, import and initialize it according to the project's conventions.

If errors was only used for this now-removed return value, remove the errors import to avoid an unused import error.

rhacs-bot · 2026-03-25T16:40:53Z

Images are ready for the commit at 13ed82b.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-430-g13ed82bbd3.

codecov · 2026-03-25T16:55:51Z

Codecov Report

❌ Patch coverage is 0% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.31%. Comparing base (123b457) to head (13ed82b).
⚠️ Report is 25 commits behind head on master.

Files with missing lines	Patch %	Lines
sensor/common/sensor/sensor.go	0.00%	13 Missing ⚠️
sensor/kubernetes/fake/fake.go	0.00%	9 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #19611      +/-   ##
==========================================
+ Coverage   49.25%   49.31%   +0.06%     
==========================================
  Files        2735     2737       +2     
  Lines      206138   206467     +329     
==========================================
+ Hits       101539   101825     +286     
- Misses      97051    97096      +45     
+ Partials     7548     7546       -2

Flag	Coverage Δ
go-unit-tests	`49.31% <0.00%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

vikin91 added the ai-assisted label Mar 25, 2026

openshift-ci bot added the do-not-merge/work-in-progress label Mar 25, 2026

github-actions bot added area/sensor ai-review labels Mar 25, 2026

sourcery-ai bot reviewed Mar 25, 2026

View reviewed changes

Round 2

13ed82b

vikin91 force-pushed the piotr/fake-workload-offline branch from e7e5ac8 to 13ed82b Compare March 26, 2026 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(scale): add online/offline transition cycle#19611

chore(scale): add online/offline transition cycle#19611
vikin91 wants to merge 1 commit intomasterfrom
piotr/fake-workload-offline

vikin91 commented Mar 25, 2026 •

edited by atlassian bot

Loading

Uh oh!

vikin91 commented Mar 25, 2026 •

edited

Loading

Uh oh!

openshift-ci bot commented Mar 25, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Mar 25, 2026

Uh oh!

sourcery-ai bot Mar 25, 2026

Uh oh!

sourcery-ai bot Mar 25, 2026

Uh oh!

rhacs-bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vikin91 commented Mar 25, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Uh oh!

vikin91 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Mar 25, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

rhacs-bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vikin91 commented Mar 25, 2026 •

edited by atlassian bot

Loading

vikin91 commented Mar 25, 2026 •

edited

Loading

rhacs-bot commented Mar 25, 2026 •

edited

Loading

codecov bot commented Mar 25, 2026 •

edited

Loading