Skip to content

ROX-31236: Remove ROX_CAPTURE_INTERMEDIATE_EVENTS flag#17240

Merged
vikin91 merged 12 commits intomasterfrom
piotr/ROX-31236-remove-capture-events-flag
Oct 16, 2025
Merged

ROX-31236: Remove ROX_CAPTURE_INTERMEDIATE_EVENTS flag#17240
vikin91 merged 12 commits intomasterfrom
piotr/ROX-31236-remove-capture-events-flag

Conversation

@vikin91
Copy link
Contributor

@vikin91 vikin91 commented Oct 10, 2025

Human here

This has been created fully by AI.

  • I have checked the code changes and the PR description.
  • I have manually removed test code where only AI added t.Skip. This was followed by running an unused-code check.

Description

This PR removes the ROX_CAPTURE_INTERMEDIATE_EVENTS feature flag which has been enabled by default since Release 4.3 (February 2024) - approximately 1.7 years in production. Since this removal depends on the connection retry mechanism from #17236, both removals are included in this PR.

📋 Detailed Implementation Plan: REMOVAL_PLAN_CAPTURE_INTERMEDIATE_EVENTS.md

Why it's safe to remove:

  • Enabled since: Release 4.3 (February 2024) - 1.7 years ago
  • Stability: Single clean toggle, never reverted
  • Feature maturity: Offline v3 proven stable, no data loss issues
  • Risk assessment: Medium-High (data integrity) but 1.7 years proven

Removal Sequence Justification

This PR combines the first and third PRs of 4 in the feature flag removal sequence.

Position in sequence: FIRST + THIRD (Foundation + Layer 2)

Why this order:

Dependencies:

Merge Strategy: #17236 should be merged first for clean git history. After #17236 merges, this PR will only show the ROX_CAPTURE_INTERMEDIATE_EVENTS changes.

Reference: See attached REMOVAL_PLAN_CAPTURE_INTERMEDIATE_EVENTS.md for complete analysis.

Files modified:

  • pkg/features/list.go - removed both flag definitions
  • sensor/common/sensor/sensor.go - simplified startup (from ROX-31236: Remove ROX_PREVENT_SENSOR_RESTART_ON_DISCONNECT flag #17236)
  • sensor/common/networkflow/manager/ - 5 files updated (always buffer, always enrich)
  • sensor/common/processsignal/pipeline.go - simplified context management
  • sensor/common/detector/ - 2 files updated (always use buffered queues)
  • sensor/tests/connection/ - removed flag setup from tests

Suggested reviewers: (AI got that wrong)

User-facing documentation

  • CHANGELOG.md is updated OR update is not needed
  • documentation PR is created and is linked above OR is not needed

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Verified all .Enabled() checks removed from production code
  • Confirmed offline behavior tests pass
  • Tested event buffering during simulated disconnect
  • Verified no events dropped during offline periods
  • Checked memory usage remains acceptable
  • Confirmed reconnection recovery works correctly

@openshift-ci
Copy link

openshift-ci bot commented Oct 10, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Remove any remaining dead code paths and unused helper methods, such as the now-unused resetContext and resetLastSentState, to keep the codebase clean.
  • Ensure the simplified non-blocking sendToCentral logic handles backpressure correctly and logs dropped events as intended under high-throughput conditions.
  • Verify that the refactored Notify methods across components preserve the correct offline-to-online transition behavior without the old feature flag checks.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Remove any remaining dead code paths and unused helper methods, such as the now-unused resetContext and resetLastSentState, to keep the codebase clean.
- Ensure the simplified non-blocking sendToCentral logic handles backpressure correctly and logs dropped events as intended under high-throughput conditions.
- Verify that the refactored Notify methods across components preserve the correct offline-to-online transition behavior without the old feature flag checks.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@vikin91 vikin91 closed this Oct 10, 2025
@vikin91 vikin91 reopened this Oct 10, 2025
@vikin91 vikin91 changed the base branch from piotr/ROX-31236-remove-prevent-sensor-restart-flag to master October 10, 2025 12:22
@vikin91 vikin91 force-pushed the piotr/ROX-31236-remove-capture-events-flag branch 3 times, most recently from a347526 to deb0695 Compare October 10, 2025 12:26
@rhacs-bot
Copy link
Contributor

rhacs-bot commented Oct 10, 2025

Images are ready for the commit at 6fcac11.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.10.x-29-g6fcac119fc.

@codecov
Copy link

codecov bot commented Oct 10, 2025

Codecov Report

❌ Patch coverage is 63.63636% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.84%. Comparing base (b2daa24) to head (6fcac11).
⚠️ Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
sensor/common/networkflow/manager/manager_impl.go 58.82% 7 Missing ⚠️
sensor/common/detector/detector.go 0.00% 3 Missing ⚠️
sensor/common/processsignal/pipeline.go 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17240      +/-   ##
==========================================
- Coverage   48.86%   48.84%   -0.03%     
==========================================
  Files        2720     2720              
  Lines      203364   203276      -88     
==========================================
- Hits        99373    99284      -89     
- Misses      96167    96174       +7     
+ Partials     7824     7818       -6     
Flag Coverage Δ
go-unit-tests 48.84% <63.63%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vikin91 vikin91 force-pushed the piotr/ROX-31236-remove-capture-events-flag branch from fa41283 to 4db7f93 Compare October 10, 2025 13:44
@vikin91 vikin91 changed the title ROX-31236: Remove ROX_CAPTURE_INTERMEDIATE_EVENTS flag ROX-31236: Remove ROX_CAPTURE_INTERMEDIATE_EVENTS flag Oct 13, 2025
@vikin91 vikin91 force-pushed the piotr/ROX-31236-remove-capture-events-flag branch from 4db7f93 to 9ebf8d6 Compare October 13, 2025 15:28
@vikin91 vikin91 marked this pull request as ready for review October 14, 2025 12:33
@vikin91 vikin91 requested a review from a team as a code owner October 14, 2025 12:33
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The refactored sendToCentral now uses a non-blocking select in all cases, whereas previously the unbuffered path would block until consumption; please verify this change does not lead to unintended message drops under normal and high-load scenarios.
  • The simplified Pipeline.Notify no longer handles context recreation or cancellation for offline/online transitions; ensure that this refactor doesn’t introduce context leaks or alter the intended signal timing in edge cases.
  • After removing the SensorCapturesIntermediateEvents guard in Queue.Start, the isRunning signal is no longer emitted on Start, so please ensure the queue’s initial state still allows processing without deadlock.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The refactored sendToCentral now uses a non-blocking select in all cases, whereas previously the unbuffered path would block until consumption; please verify this change does not lead to unintended message drops under normal and high-load scenarios.
- The simplified Pipeline.Notify no longer handles context recreation or cancellation for offline/online transitions; ensure that this refactor doesn’t introduce context leaks or alter the intended signal timing in edge cases.
- After removing the SensorCapturesIntermediateEvents guard in Queue.Start, the isRunning signal is no longer emitted on Start, so please ensure the queue’s initial state still allows processing without deadlock.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

This removes the ROX_CAPTURE_INTERMEDIATE_EVENTS feature flag that has been
enabled by default since Release 4.3 (February 2024).

The flag enabled offline v3 behavior where Sensor buffers events during
Central disconnects. This is now the standard operational mode.

Changes:
- Removed feature flag definition from pkg/features/list.go
- Network flow manager: Always use buffered channels and continue enrichment during offline mode
- Process signal pipeline: Always use long-lived contexts that persist across disconnects
- Detector: Always use buffered queues and handle pause/resume on connection events
- Removed EnrichmentReasonEpFeatureDisabled (no longer returned)
- Updated integration tests to remove flag setup
- Removed unused features package imports from 9 files

All event buffering and offline operation functionality remains fully operational.

This PR is stacked on #17236 (ROX_PREVENT_SENSOR_RESTART_ON_DISCONNECT removal).

Partially generated with AI assistance.
- Updated comments to reflect feature is always enabled
- Removed obsolete test cases for disabled feature
- Skipped tests specific to v1/v2 behavior (feature disabled)
- Removed EnrichmentReasonEpFeatureDisabled constant
@vikin91 vikin91 force-pushed the piotr/ROX-31236-remove-capture-events-flag branch from 8e26560 to 5c6bd5b Compare October 14, 2025 14:41
@vikin91
Copy link
Contributor Author

vikin91 commented Oct 15, 2025

  • The refactored sendToCentral now uses a non-blocking select in all cases, whereas previously the unbuffered path would block until consumption; please verify this change does not lead to unintended message drops under normal and high-load scenarios.

We are aware that the channel will start dropping messages once it is full. It's been running like this for months.

  • The simplified Pipeline.Notify no longer handles context recreation or cancellation for offline/online transitions; ensure that this refactor doesn’t introduce context leaks or alter the intended signal timing in edge cases.

We use a long-lived detection context to not stop detection in offline mode. The context for sending to Central while offline will still be canceled so that messages are not sent.

  • After removing the SensorCapturesIntermediateEvents guard in Queue.Start, the isRunning signal is no longer emitted on Start, so please ensure the queue’s initial state still allows processing without deadlock.

Fixed in 2431bb2 by starting the queue in running mode.

aand reverted in 6fcac11 due to failing tests. Apparently the queue is expected to start in paused state. Let's keep it like this.

@openshift-ci
Copy link

openshift-ci bot commented Oct 15, 2025

@vikin91: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ocp-4-18-nongroovy-e2e-tests 6fcac11 link false /test ocp-4-18-nongroovy-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@vikin91 vikin91 merged commit 9209afe into master Oct 16, 2025
99 of 100 checks passed
@vikin91 vikin91 deleted the piotr/ROX-31236-remove-capture-events-flag branch October 16, 2025 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants