Skip to content

ROX-29771: Ensure Sensor is up before e2e tests#17502

Merged
vikin91 merged 3 commits intomasterfrom
piotr/piotr/ROX-29771-test-pods-2
Oct 28, 2025
Merged

ROX-29771: Ensure Sensor is up before e2e tests#17502
vikin91 merged 3 commits intomasterfrom
piotr/piotr/ROX-29771-test-pods-2

Conversation

@vikin91
Copy link
Contributor

@vikin91 vikin91 commented Oct 24, 2025

Description

Add Sensor health checks to TestPod and TestContainerInstances to prevent test failures caused by insufficient recovery time after Sensor restarts.

Problem

Tests that depend on process event collection (TestPod, TestContainerInstances) can fail when they start immediately after a previous test restarts Sensor. In the analyzed failure:

  • TestDelegatedScanning removed LOGLEVEL env variable during cleanup at 15:09:22.540
  • This triggered a Sensor pod restart
  • TestPod started 0.07 seconds later, before the Collector→Sensor→Central event pipeline fully recovered
  • Result: Zero process events collected over 158 seconds, causing test failure

Solution

Added waitForSensorHealthy() function that:

  1. Waits for Sensor deployment to be ready in Kubernetes (all replicas running)
  2. Waits for Central to report HEALTHY connection status with Sensor
  3. Ensures the event collection pipeline is functional before tests proceed

This is called at the beginning of both TestPod and TestContainerInstances, following the same pattern used in TestDelegatedScanning.waitForHealthyCentralSensorConn().

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Run CI ocp-4-XX-nongroovy-e2e-tests multiple times

Expected CI behavior:

  • Tests should wait for Sensor health before starting (new log: "Waiting for Sensor to be healthy before starting test")
  • Event collection should succeed after health checks pass
  • Tests may take slightly longer due to health check overhead (~2-10 seconds depending on Sensor state)

If tests still fail in CI, the health check timeouts may need adjustment or additional diagnostics may be needed to understand Sensor recovery time.

AI-assisted development

  • AI-generated: Initial code structure for waitForSensorHealthy() function and health check calls
  • Human-written/verified:
    • Root cause analysis from build logs
    • Import statements and code organization
    • Comments explaining the fix
    • Integration with existing test infrastructure
    • Verification that code compiles and follows existing patterns

@vikin91
Copy link
Contributor Author

vikin91 commented Oct 24, 2025

This change is part of the following stack:

Change managed by git-spice.

@openshift-ci
Copy link

openshift-ci bot commented Oct 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@vikin91 vikin91 added auto-retest PRs with this label will be automatically retested if prow checks fails ai-assisted labels Oct 24, 2025
@rhacs-bot
Copy link
Contributor

rhacs-bot commented Oct 24, 2025

Images are ready for the commit at a44e392.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.10.x-151-ga44e3921a3.

@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.67%. Comparing base (4ef6e8e) to head (a44e392).
⚠️ Report is 21 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #17502   +/-   ##
=======================================
  Coverage   48.67%   48.67%           
=======================================
  Files        2723     2723           
  Lines      202878   202878           
=======================================
  Hits        98752    98752           
  Misses      96359    96359           
  Partials     7767     7767           
Flag Coverage Δ
go-unit-tests 48.67% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vikin91
Copy link
Contributor Author

vikin91 commented Oct 27, 2025

/test gke-nongroovy-e2e-tests gke-upgrade-tests

@vikin91
Copy link
Contributor Author

vikin91 commented Oct 27, 2025

/retest-times 10 ocp-4-12-nongroovy-e2e-tests

@vikin91 vikin91 marked this pull request as ready for review October 27, 2025 10:50
@rhacs-bot
Copy link
Contributor

/test ocp-4-12-nongroovy-e2e-tests

@vikin91
Copy link
Contributor Author

vikin91 commented Oct 27, 2025

/test ocp-4-19-nongroovy-e2e-tests ocp-4-18-nongroovy-e2e-tests ocp-4-12-nongroovy-e2e-tests gke-nongroovy-e2e-tests

@rhacs-bot
Copy link
Contributor

/test ocp-4-12-nongroovy-e2e-tests

4 similar comments
@rhacs-bot
Copy link
Contributor

/test ocp-4-12-nongroovy-e2e-tests

@rhacs-bot
Copy link
Contributor

/test ocp-4-12-nongroovy-e2e-tests

@rhacs-bot
Copy link
Contributor

/test ocp-4-12-nongroovy-e2e-tests

@rhacs-bot
Copy link
Contributor

/test ocp-4-12-nongroovy-e2e-tests

@vikin91 vikin91 requested a review from janisz October 28, 2025 06:17
Copy link
Contributor

@janisz janisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhacs-bot
Copy link
Contributor

/test ocp-4-12-nongroovy-e2e-tests

@vikin91
Copy link
Contributor Author

vikin91 commented Oct 28, 2025

Many thanks for the review @janisz! I will resolve the discussion and merge it, as I would like to rebase #17354 and #17374 as soon as possible and confirm that it works when using the prefetched images.

@vikin91 vikin91 merged commit 39ca19f into master Oct 28, 2025
102 of 103 checks passed
@vikin91 vikin91 deleted the piotr/piotr/ROX-29771-test-pods-2 branch October 28, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-assisted auto-retest PRs with this label will be automatically retested if prow checks fails

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants