ROX-19064: Scanner V4 CI Wait for Vulns to Load by dcaravel · Pull Request #19836 · stackrox/stackrox

dcaravel · 2026-04-06T03:24:44Z

Description

Alternative: #19930 (only one is needed)

Adds the rails for CI jobs to wait for vuln loads to finish before starting tests.

This PR polls the Central API to determine if vulns are loaded (same API that is used by System Health).

Another option was considered to use the 'readiness' setting in Scanner V4 matcher so that the pod does not reach a readiness state until vulns are loaded. The polling approach was favor because it does not require making changes in CI for each different install type (manifest, helm, operator, etc.) and the cause of timeouts would be 'less obvious' when jobs fail - with polling the failure reason is directly in the build logs (amongst other things).

Prior to polling the available storage classes are listed for the cluster to assist troubleshooting if loads are slow (to verify if the DB PVC is using an SSD).

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

The changes themselves are tests

How I validated my change

Against StackRox Scanner these changes will be tested by CI as part of this PR

Against Scanner V4 these changes were validated in #19236 and will be validated again in a future PR when Scanner V4 is officially turned on in CI.

Available storage classes on this cluster:
Name:                  premium-rwo
IsDefaultClass:        No
Annotations:           components.gke.io/component-name=pdcsi,components.gke.io/component-version=0.22.49,components.gke.io/layer=addon
Provisioner:           pd.csi.storage.gke.io
Parameters:            type=pd-ssd
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>


Name:                  standard
IsDefaultClass:        No
Annotations:           components.gke.io/component-name=pdcsi,components.gke.io/component-version=0.22.49,components.gke.io/layer=addon,storageclass.kubernetes.io/is-default-class=false
Provisioner:           kubernetes.io/gce-pd
Parameters:            type=pd-standard
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>


Name:                  standard-rwo
IsDefaultClass:        Yes
Annotations:           components.gke.io/component-name=pdcsi,components.gke.io/component-version=0.22.49,components.gke.io/layer=addon,storageclass.kubernetes.io/is-default-class=true
Provisioner:           pd.csi.storage.gke.io
Parameters:            type=pd-balanced
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>
...

INFO: Sun Apr  5 23:11:29 UTC 2026: Scanner V4 vuln load check: HTTP 500 (0s/2400s): failed to obtain 

INFO: Sun Apr  5 23:34:26 UTC 2026: Scanner V4 vuln load check: HTTP 500 (1377s/2400s): failed to obtain vulnerability definitions information: no timestamp available

INFO: Sun Apr  5 23:34:57 UTC 2026: Scanner V4 vulnerability loading complete (1408s elapsed).

openshift-ci · 2026-04-06T03:24:48Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

codecov · 2026-04-06T03:36:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.56%. Comparing base (2d5d7a2) to head (8beb887).
⚠️ Report is 67 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #19836      +/-   ##
==========================================
- Coverage   49.60%   49.56%   -0.04%     
==========================================
  Files        2763     2764       +1     
  Lines      208339   208357      +18     
==========================================
- Hits       103341   103269      -72     
- Misses      97331    97436     +105     
+ Partials     7667     7652      -15

Flag	Coverage Δ
go-unit-tests	`49.56% <ø> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-04-06T03:37:05Z

🚀 Build Images Ready

Images are ready for commit 8beb887. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-562-g8beb887623

github-actions · 2026-04-06T03:37:05Z

🚀 Build Images Ready

Images are ready for commit 3c47842. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-561-g3c47842981

dcaravel · 2026-04-06T04:20:21Z

/test all

janisz · 2026-04-06T13:58:38Z

tests/e2e/lib.sh

+# (i.e. database connectivity). Call this separately in jobs that verify scan
+# results, after deploy_stackrox has returned.
+wait_for_scanner_v4_vuln_load() {
+    local max_seconds="${SCANNER_V4_VULN_LOAD_TIMEOUT:-2400}"


Is it 40 minutes wait in CI? Can we add a tag for scanner v4 tests and run everything else instead of waiting?

Alternatively load smaller vulnerability list so it will be ready in a second. We have list of images in prefetcher config so let's load only what's really needed.

Please see #19835 - Scanner V4 isn't enabled yet - working on reducing this as much as possible while managing scope.

Can we add a tag for scanner v4 tests and run everything else instead of waiting?

Possibly - many tests (UI, compliance, deployment, policy, etc.) rely on the ability to scan images, so this may add complexity and not buy much - in my testing the overall CI runs were completing in a similar timeframe as today with the other optimizations in review - will continue to find optimizations. FWIW CI today waits for Scanner V2 vulns to load (via pod readiness) so this isn't a 'new' concept.

jvdm · 2026-04-09T21:57:31Z

tests/e2e/lib.sh

+# This is distinct from wait_for_scanner_V4, which only waits for pod readiness
+# (i.e. database connectivity). Call this separately in jobs that verify scan
+# results, after deploy_stackrox has returned.
+wait_for_scanner_v4_vuln_load() {


Going ove the reasons as to not use readiness for this:

The polling approach was favor because it does not require making changes in CI for each different install type (manifest, helm, operator, etc.)

The change required is an environment variable. That type of customization already exists for different install types, why for this particular case a custom vars in CI is not desired? For example, weight adding the custom var with the bash script that you're proposing.

the cause of timeouts would be 'less obvious' when jobs fail - with polling the failure reason is directly in the build logs (amongst other things).

Pods not getting ready in time for other reasons are still part of the failure path. We would still have to investigate them regardless. Is it the case that timeouts due to readiness tied to vulnerability updates are completely obscure? If that the case, improving the outcome (status code + body message) of the readiness probe would be better compared to re-implementing the same job kubernetes does in the bash script?

Here is the alternative: #19930

Please give that an approve if preferred, I see benefits of both approaches (will admit the bash 'magic' in this PR isn't my favorite)

Some thoughts on this approach:

Quicker to debug, especially while iterating on optimizing load times

Can see the timings right on the prow landing page. Loading failures are highlighted and it's clear when due to taking too long vs. other causes of pod not being ready (such DB connect failures)

This logic doesn't need to change if installers change (albeit trivial)

It tests the same mechanism UI/users use to determine if vulns are loaded / up to date.

It keeps matcher behavior consistent with production installs. The interactions between Central and Matcher can be tested while in a loading state (if desired).

dcaravel · 2026-04-10T23:19:51Z

/test gke-nongroovy-e2e-tests

openshift-ci bot added the do-not-merge/work-in-progress label Apr 6, 2026

wait for scanner v4 vulns to load

3d0cf01

dcaravel force-pushed the dc/scan4-wait-vulns branch from 3c47842 to 3d0cf01 Compare April 6, 2026 03:30

dcaravel marked this pull request as ready for review April 6, 2026 13:48

dcaravel requested a review from janisz as a code owner April 6, 2026 13:48

openshift-ci bot removed the do-not-merge/work-in-progress label Apr 6, 2026

dcaravel requested a review from a team April 6, 2026 14:00

janisz reviewed Apr 6, 2026

View reviewed changes

dcaravel requested a review from janisz April 9, 2026 19:26

jvdm reviewed Apr 9, 2026

View reviewed changes

dcaravel mentioned this pull request Apr 10, 2026

ROX-19064: Scanner V4 CI Wait for Vulns to Load (readiness version) #19930

Open

5 tasks

remove resource prints

8beb887

dcaravel added the auto-retest PRs with this label will be automatically retested if prow checks fails label Apr 10, 2026

dcaravel requested a review from jvdm April 10, 2026 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROX-19064: Scanner V4 CI Wait for Vulns to Load#19836

ROX-19064: Scanner V4 CI Wait for Vulns to Load#19836
dcaravel wants to merge 2 commits intomasterfrom
dc/scan4-wait-vulns

dcaravel commented Apr 6, 2026 •

edited

Loading

Uh oh!

openshift-ci bot commented Apr 6, 2026

Uh oh!

codecov bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

dcaravel commented Apr 6, 2026

Uh oh!

janisz Apr 6, 2026

Uh oh!

janisz Apr 6, 2026

Uh oh!

dcaravel Apr 6, 2026 •

edited

Loading

Uh oh!

jvdm Apr 9, 2026 •

edited

Loading

Uh oh!

dcaravel Apr 10, 2026 •

edited

Loading

Uh oh!

dcaravel commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dcaravel commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Uh oh!

openshift-ci bot commented Apr 6, 2026

Uh oh!

codecov bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Build Images Ready

Uh oh!

github-actions bot commented Apr 6, 2026

🚀 Build Images Ready

Uh oh!

dcaravel commented Apr 6, 2026

Uh oh!

janisz Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

janisz Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

dcaravel Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jvdm Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcaravel Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dcaravel commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dcaravel commented Apr 6, 2026 •

edited

Loading

codecov bot commented Apr 6, 2026 •

edited

Loading

github-actions bot commented Apr 6, 2026 •

edited

Loading

dcaravel Apr 6, 2026 •

edited

Loading

jvdm Apr 9, 2026 •

edited

Loading

dcaravel Apr 10, 2026 •

edited

Loading