-
Notifications
You must be signed in to change notification settings - Fork 175
ROX-19064: Scanner V4 CI Wait for Vulns to Load #19836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1167,6 +1167,60 @@ wait_for_scanner_V4() { | |
| wait_for_ready_deployment "$namespace" "scanner-v4-matcher" "$max_seconds" | ||
| } | ||
|
|
||
| # wait_for_scanner_v4_vuln_load waits until the Scanner V4 matcher has completed | ||
| # its initial vulnerability load, verified via Central's integration health API. | ||
| # This is distinct from wait_for_scanner_V4, which only waits for pod readiness | ||
| # (i.e. database connectivity). Call this separately in jobs that verify scan | ||
| # results, after deploy_stackrox has returned. | ||
| wait_for_scanner_v4_vuln_load() { | ||
| local max_seconds="${SCANNER_V4_VULN_LOAD_TIMEOUT:-2400}" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it 40 minutes wait in CI? Can we add a tag for scanner v4 tests and run everything else instead of waiting?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively load smaller vulnerability list so it will be ready in a second. We have list of images in prefetcher config so let's load only what's really needed.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see #19835 - Scanner V4 isn't enabled yet - working on reducing this as much as possible while managing scope.
Possibly - many tests (UI, compliance, deployment, policy, etc.) rely on the ability to scan images, so this may add complexity and not buy much - in my testing the overall CI runs were completing in a similar timeframe as today with the other optimizations in review - will continue to find optimizations. FWIW CI today waits for Scanner V2 vulns to load (via pod readiness) so this isn't a 'new' concept. |
||
| info "Waiting for Scanner V4 to finish loading vulnerabilities..." | ||
|
|
||
| info "--- Available storage classes on this cluster ---" | ||
| kubectl describe storageclasses 2>/dev/null || true | ||
| info "SCANNER_V4_DB_STORAGE_CLASS=${SCANNER_V4_DB_STORAGE_CLASS:-<unset>}" | ||
|
|
||
| require_environment "API_ENDPOINT" | ||
| require_environment "ROX_ADMIN_PASSWORD" | ||
|
|
||
| local start_time elapsed | ||
| start_time="$(date '+%s')" | ||
| while true; do | ||
| # -w '\n%{http_code}' appends a newline and the HTTP status code after the | ||
| # response body, letting us capture both in a single variable. | ||
| local response http_code body | ||
| response=$(curl -sk \ | ||
| --config <(curl_cfg user "admin:${ROX_ADMIN_PASSWORD}") \ | ||
| -w '\n%{http_code}' \ | ||
| "https://${API_ENDPOINT}/v1/integrationhealth/vulndefinitions?component=SCANNER_V4") | ||
| # ##*$'\n' strips up to and including the last newline (leaving just the 3-digit code). | ||
| http_code="${response##*$'\n'}" | ||
| # %$'\n'* strips the trailing newline+code (leaving just the body). | ||
| body="${response%$'\n'*}" | ||
|
|
||
| elapsed=$(( $(date '+%s') - start_time )) | ||
|
|
||
| if [[ "${http_code}" != "200" ]]; then | ||
| # Try to extract the message field from a JSON error body; fall back to | ||
| # the raw body if it's not JSON or the field is absent. | ||
| local err_detail | ||
| err_detail=$(jq -r '.message // empty' <<< "${body}" 2>/dev/null) | ||
| info "Scanner V4 vuln load check: HTTP ${http_code} (${elapsed}s/${max_seconds}s): ${err_detail:-${body}}" | ||
| elif echo "${body}" | jq -e '.lastUpdatedTimestamp != null' >/dev/null 2>&1; then | ||
| info "Scanner V4 vulnerability loading complete (${elapsed}s elapsed)." | ||
| return | ||
| else | ||
| info "Scanner V4 vuln load check: HTTP 200, lastUpdatedTimestamp not set yet (${elapsed}s/${max_seconds}s)" | ||
| fi | ||
|
|
||
| if (( elapsed > max_seconds )); then | ||
| die "wait_for_scanner_v4_vuln_load() timed out after ${max_seconds}s." | ||
| fi | ||
|
|
||
| sleep 30 | ||
| done | ||
| } | ||
|
|
||
| # shellcheck disable=SC2120 | ||
| wait_for_api() { | ||
| local central_namespace=${1:-stackrox} | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going ove the reasons as to not use readiness for this:
The change required is an environment variable. That type of customization already exists for different install types, why for this particular case a custom vars in CI is not desired? For example, weight adding the custom var with the bash script that you're proposing.
Pods not getting ready in time for other reasons are still part of the failure path. We would still have to investigate them regardless. Is it the case that timeouts due to readiness tied to vulnerability updates are completely obscure? If that the case, improving the outcome (status code + body message) of the readiness probe would be better compared to re-implementing the same job kubernetes does in the bash script?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the alternative: #19930
Please give that an approve if preferred, I see benefits of both approaches (will admit the bash 'magic' in this PR isn't my favorite)
Some thoughts on this approach: