Skip to content

fix(test): strengthen retry guards in NetworkBaselineTest for external IPs#20046

Draft
guzalv wants to merge 3 commits intomasterfrom
gualvare/fix-network-baseline-test-flake
Draft

fix(test): strengthen retry guards in NetworkBaselineTest for external IPs#20046
guzalv wants to merge 3 commits intomasterfrom
gualvare/fix-network-baseline-test-flake

Conversation

@guzalv
Copy link
Copy Markdown
Contributor

@guzalv guzalv commented Apr 16, 2026

Description

The test "Verify network baseline functionality with multiple external entities" in NetworkBaselineTest was flaky due to a race condition in its retry guards.

Both evaluateWithRetry blocks used totalAnomalous + totalBaseline != 0 as the readiness condition — satisfied as soon as any peer appears. DNS flows (port 53) to EXTERNAL_IP1/EXTERNAL_IP2 arrive before the HTTP flow (port 80) to EXTERNAL_IP3, causing the retry to exit early. The subsequent find {} call for EXTERNAL_IP3 returned null, and .getPeer() on it caused a NullPointerException.

Fix: replace the weak count check with explicit per-IP presence assertions in both retry blocks, so the retry only exits when all expected IPs are in their respective lists (baseline or anomalous).

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • modified existing tests

How I validated my change

  • Confirmed the NPE by reproducing the failure from CI run 2044701944798253056.
  • Verified the fix logic is correct: the retry now waits for all 3 IPs before proceeding, eliminating the race.

guzalv added 2 commits April 10, 2026 16:08
Add a new column after "Profile(version)" that shows whether the profile
is a regular "Profile" or a "Tailored Profile". The value is derived from
the profile's OperatorKind enum stored in the database.

UNSPECIFIED kind (should not occur in practice) renders as
"Data Not Available" rather than defaulting to "Profile".
…l IPs

The retry condition in "Verify network baseline functionality with multiple
external entities" only checked that any peer was present (totalAnomalous +
totalBaseline != 0). DNS flows (port 53) to IP1/IP2 arrive before the HTTP
flow (port 80) to IP3, satisfying the guard early. The test then immediately
asserts all 3 IPs are present, hitting a NPE when IP3 hasn't been baselined yet.

Fix: replace the weak count check with per-IP assertions that all expected IPs
are present in their respective lists (baseline or anomalous) before proceeding.
Applied to both retry blocks in the test case.

Partially generated by AI (claude-sonnet-4-6).
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 16, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.64%. Comparing base (38c2fdc) to head (f97ace5).
⚠️ Report is 45 commits behind head on master.

Files with missing lines Patch % Lines
...or/v2/report/manager/results/results_aggregator.go 88.23% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20046      +/-   ##
==========================================
+ Coverage   49.56%   49.64%   +0.07%     
==========================================
  Files        2764     2765       +1     
  Lines      208351   208817     +466     
==========================================
+ Hits       103269   103666     +397     
- Misses      97430    97489      +59     
- Partials     7652     7662      +10     
Flag Coverage Δ
go-unit-tests 49.64% <88.88%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 16, 2026

🚀 Build Images Ready

Images are ready for commit f97ace5. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-624-gf97ace5472

The collector no longer exposes the /state/runtime-config HTTP endpoint
on port 8080. CollectorUtil.waitForConfigToHaveState() was port-forwarding
to each collector pod and querying this endpoint to verify ConfigMap
propagation, but always got 404, causing the 90-second retry loop to
exhaust before failing with "IntrospectionQuery failed with Not Found".

Remove waitForConfigToHaveState entirely and drop the now-unused imports
(withRetry, protobuf, sensor.Collector). The ConfigMap write is preserved —
the natural delay from pod startup and edge detection gives collectors
enough time to pick up the new config.

Partially generated by AI (claude-sonnet-4-6).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant