Skip to content

ROX-19064: Scanner V4 CI Wait for Vulns to Load#19836

Open
dcaravel wants to merge 1 commit intomasterfrom
dc/scan4-wait-vulns
Open

ROX-19064: Scanner V4 CI Wait for Vulns to Load#19836
dcaravel wants to merge 1 commit intomasterfrom
dc/scan4-wait-vulns

Conversation

@dcaravel
Copy link
Copy Markdown
Contributor

@dcaravel dcaravel commented Apr 6, 2026

Description

Adds the rails for CI jobs to wait for vuln loads to finish before starting tests.

This PR polls the Central API to determine if vulns are loaded (same API that is used by System Health).

Another option was considered to use the 'readiness' setting in Scanner V4 matcher so that the pod does not reach a readiness state until vulns are loaded. The polling approach was favor because it does not require making changes in CI for each different install type (manifest, helm, operator, etc.) and the cause of timeouts would be 'less obvious' when jobs fail - with polling the failure reason is directly in the build logs (amongst other things).

Prior to polling the available storage classes are listed for the cluster to assist troubleshooting if loads are slow (to verify if the DB PVC is using an SSD), additionally each poll dumps the current top pods cpu/mem consumption to assist in troubleshooting/measuring (as needed).

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

The changes themselves are tests

How I validated my change

Against StackRox Scanner these changes will be tested by CI as part of this PR

Against Scanner V4 these changes were validated in #19236 and will be validated again in a future PR when Scanner V4 is officially turned on in CI.

Available storage classes on this cluster:
Name:                  premium-rwo
IsDefaultClass:        No
Annotations:           components.gke.io/component-name=pdcsi,components.gke.io/component-version=0.22.49,components.gke.io/layer=addon
Provisioner:           pd.csi.storage.gke.io
Parameters:            type=pd-ssd
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>


Name:                  standard
IsDefaultClass:        No
Annotations:           components.gke.io/component-name=pdcsi,components.gke.io/component-version=0.22.49,components.gke.io/layer=addon,storageclass.kubernetes.io/is-default-class=false
Provisioner:           kubernetes.io/gce-pd
Parameters:            type=pd-standard
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     Immediate
Events:                <none>


Name:                  standard-rwo
IsDefaultClass:        Yes
Annotations:           components.gke.io/component-name=pdcsi,components.gke.io/component-version=0.22.49,components.gke.io/layer=addon,storageclass.kubernetes.io/is-default-class=true
Provisioner:           pd.csi.storage.gke.io
Parameters:            type=pd-balanced
AllowVolumeExpansion:  True
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>
...

INFO: Sun Apr  5 23:11:29 UTC 2026: Scanner V4 vuln load check: HTTP 500 (0s/2400s): failed to obtain vulnerability definitions information: no timestamp available
INFO: Sun Apr  5 23:11:29 UTC 2026: --- Pod resources at 0s ---
admission-control-864d4449d8-h58qg    2m     27Mi     
admission-control-864d4449d8-t94ff    2m     27Mi     
central-7c5bd76857-f994x              48m    248Mi    
central-db-d5c7d5698-fbzfr            14m    350Mi    
collector-hjz6b                       134m   218Mi    
collector-mbdrl                       74m    179Mi    
collector-pz4bp                       146m   226Mi    
collector-rgdh5                       105m   255Mi    
config-controller-8dd8bd7bc-9dpgg     1m     20Mi     
scanner-77bbf4c5f6-kll4r              269m   199Mi    
scanner-db-fd57f6f95-bbwng            317m   152Mi    
scanner-v4-db-df6d79c-546vh           807m   1067Mi   
scanner-v4-indexer-5c96cf4b47-rp26m   957m   373Mi    
scanner-v4-matcher-97f4c9676-t4c7r    337m   92Mi     
sensor-84bc649649-plrnr               45m    245Mi 

INFO: Sun Apr  5 23:34:26 UTC 2026: Scanner V4 vuln load check: HTTP 500 (1377s/2400s): failed to obtain vulnerability definitions information: no timestamp available
INFO: Sun Apr  5 23:34:26 UTC 2026: --- Pod resources at 1377s ---
admission-control-864d4449d8-h58qg    2m     30Mi     
admission-control-864d4449d8-pj8vm    1m     32Mi     
admission-control-864d4449d8-t94ff    1m     30Mi     
central-7c5bd76857-f994x              6m     246Mi    
central-db-d5c7d5698-fbzfr            3m     438Mi    
collector-9xq5d                       50m    228Mi    
collector-hjz6b                       101m   244Mi    
collector-mbdrl                       90m    219Mi    
collector-pz4bp                       130m   248Mi    
collector-rgdh5                       83m    240Mi    
config-controller-8dd8bd7bc-9dpgg     1m     20Mi     
scanner-77bbf4c5f6-kll4r              365m   573Mi    
scanner-db-fd57f6f95-bbwng            530m   1090Mi   
scanner-v4-db-df6d79c-546vh           499m   1508Mi   
scanner-v4-indexer-5c96cf4b47-rp26m   1m     154Mi    
scanner-v4-matcher-97f4c9676-t4c7r    463m   113Mi    
sensor-84bc649649-plrnr               13m    318Mi    
INFO: Sun Apr  5 23:34:57 UTC 2026: Scanner V4 vulnerability loading complete (1408s elapsed).

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@dcaravel dcaravel force-pushed the dc/scan4-wait-vulns branch from 3c47842 to 3d0cf01 Compare April 6, 2026 03:30
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.60%. Comparing base (2d5d7a2) to head (3d0cf01).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19836   +/-   ##
=======================================
  Coverage   49.60%   49.60%           
=======================================
  Files        2763     2763           
  Lines      208339   208339           
=======================================
  Hits       103341   103341           
  Misses      97331    97331           
  Partials     7667     7667           
Flag Coverage Δ
go-unit-tests 49.60% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🚀 Build Images Ready

Images are ready for commit 3d0cf01. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-561-g3d0cf018f5

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🚀 Build Images Ready

Images are ready for commit 3c47842. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-561-g3c47842981

@dcaravel
Copy link
Copy Markdown
Contributor Author

dcaravel commented Apr 6, 2026

/test all

@dcaravel dcaravel marked this pull request as ready for review April 6, 2026 13:48
@dcaravel dcaravel requested a review from janisz as a code owner April 6, 2026 13:48
@dcaravel dcaravel requested a review from a team April 6, 2026 14:00
# (i.e. database connectivity). Call this separately in jobs that verify scan
# results, after deploy_stackrox has returned.
wait_for_scanner_v4_vuln_load() {
local max_seconds="${SCANNER_V4_VULN_LOAD_TIMEOUT:-2400}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it 40 minutes wait in CI? Can we add a tag for scanner v4 tests and run everything else instead of waiting?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively load smaller vulnerability list so it will be ready in a second. We have list of images in prefetcher config so let's load only what's really needed.

Copy link
Copy Markdown
Contributor Author

@dcaravel dcaravel Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #19835 - Scanner V4 isn't enabled yet - working on reducing this as much as possible while managing scope.

Can we add a tag for scanner v4 tests and run everything else instead of waiting?

Possibly - many tests (UI, compliance, deployment, policy, etc.) rely on the ability to scan images, so this may add complexity and not buy much - in my testing the overall CI runs were completing in a similar timeframe as today with the other optimizations in review - will continue to find optimizations. FWIW CI today waits for Scanner V2 vulns to load (via pod readiness) so this isn't a 'new' concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants