fix(ci): emailsender test (kind bug) by davdhacs · Pull Request #19050 · stackrox/stackrox

davdhacs · 2026-02-16T23:39:42Z

Description

Update to emailsender workflow (cleanup, fix/workaround for kind bug moved to acs-fleet-manager).

Fixed in stackrox/acs-fleet-manager#2577 kind load docker-image fails when importing multi-arch images because it uses ctr import --all-platforms but Docker only pulls content for the local platform. This started failing when the rox-ci-image image docker client was upgraded.

~~See kubernetes-sigs/kind#3795~~

Changes:

Replace n1hility/cancel-previous-runs@v3 (broken credentials) with native concurrency group
Add cache: false to setup-go (repo is checked out to stackrox/ subdir, setup-go can't find go.sum at workspace root)

Testing and quality

[] CI results are inspected

How I validated my change

DOCKER_DEFAULT_PLATFORM=linux/amd64 alone fails: https://github.com/stackrox/stackrox/actions/runs/22102838003
Previous failures with kind load --all-platforms error: https://github.com/stackrox/stackrox/actions/runs/22087026367, https://github.com/stackrox/stackrox/actions/runs/22086333792

rhacs-bot · 2026-02-17T00:13:06Z

Images are ready for the commit at d42a41d.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-144-gd42a41da0a.

codecov · 2026-02-17T00:15:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.51%. Comparing base (a8dd21c) to head (d42a41d).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #19050      +/-   ##
==========================================
- Coverage   49.51%   49.51%   -0.01%     
==========================================
  Files        2670     2670              
  Lines      201549   201549              
==========================================
- Hits        99803    99802       -1     
  Misses      94293    94293              
- Partials     7453     7454       +1

Flag	Coverage Δ
go-unit-tests	`49.51% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sourcery-ai

Hey - I've left some high level feedback:

Instead of disabling the Go modules cache entirely, consider using cache-dependency-path: stackrox/go.sum with actions/setup-go so you still benefit from caching while accommodating the stackrox/ subdirectory layout.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Instead of disabling the Go modules cache entirely, consider using `cache-dependency-path: stackrox/go.sum` with `actions/setup-go` so you still benefit from caching while accommodating the `stackrox/` subdirectory layout.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

kind load docker-image fails on multi-arch OCI images because docker save produces an incomplete archive (index references both platforms but only one platform's blobs are present), and kind's ctr import --all-platforms fails on the missing blobs. Pre-load main and central-db directly into the Kind node via ctr pull --platform linux/amd64, so kind load finds them already present and skips the broken docker save path. Also replace broken n1hility/cancel-previous-runs with native concurrency group, and disable setup-go cache (repo in subdir). See kubernetes-sigs/kind#3795 Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Avoid intermediate QUAY_CREDS variable; pass secrets directly to --user like sensor integration tests do in #19016. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

The until docker exec ... loop in the Pre-load multi-arch images into Kind step can run indefinitely on persistent failures; consider adding a maximum number of retries or an overall timeout to avoid hanging the workflow.
The pre-load step hardcodes --platform linux/amd64; if this workflow might ever run on non-amd64 runners, consider deriving the platform from the runner or a matrix variable to keep it portable.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `until docker exec ...` loop in the `Pre-load multi-arch images into Kind` step can run indefinitely on persistent failures; consider adding a maximum number of retries or an overall timeout to avoid hanging the workflow.
- The pre-load step hardcodes `--platform linux/amd64`; if this workflow might ever run on non-amd64 runners, consider deriving the platform from the runner or a matrix variable to keep it portable.

## Individual Comments

### Comment 1
<location> `.github/workflows/emailsender-central-compatibility.yaml:22-24` </location>
<code_context>
       - 'central/auth/m2m/**'
       - '.github/workflows/emailsender-central-compatibility.yaml'

+concurrency:
+  group: ${{ github.workflow }}
+  cancel-in-progress: true
+
 jobs:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Concurrency group should likely include the ref to avoid cross-branch cancellations

`group: ${{ github.workflow }}` will cancel in-progress runs for this workflow across all branches. If you only want to cancel previous runs for the same branch/PR (like the old `cancel-previous-runs` action), scope the group with the ref, e.g. `group: ${{ github.workflow }}-${{ github.ref }}` or `group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}` so cancellations stay per-branch instead of global.

```suggestion
concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
  cancel-in-progress: true
```
</issue_to_address>

### Comment 2
<location> `.github/workflows/emailsender-central-compatibility.yaml:77-79` </location>
<code_context>
+          # See https://github.com/kubernetes-sigs/kind/issues/3795
+          # Same approach as #19016 for sensor integration tests.
+          TAG=$(make --no-print-directory -C stackrox tag)
+          for img in quay.io/rhacs-eng/main:$TAG quay.io/rhacs-eng/central-db:$TAG; do
+            echo "Waiting for $img ..."
+            until docker exec kind-control-plane ctr -n k8s.io images pull \
+              --user "${{ secrets.QUAY_RHACS_ENG_RO_USERNAME }}:${{ secrets.QUAY_RHACS_ENG_RO_PASSWORD }}" \
+              --platform linux/amd64 \
</code_context>

<issue_to_address>
**issue (bug_risk):** Add a maximum retry or timeout around the image pull loop to avoid hanging the job indefinitely

This `until ...; do sleep 30; done` construct will loop indefinitely if the image never becomes available (e.g., bad tag or credentials), effectively hanging the workflow until the global job timeout. Please bound this with either a max retry counter or a time-based cutoff (for example, tracking a `retries` variable and exiting with non-zero once it exceeds a threshold) so failures surface earlier and more predictably.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

.github/workflows/emailsender-central-compatibility.yaml

The pre-load via ctr pull works but kind load docker-image still runs (called by acs-fleet-manager entrypoint) and fails due to digest mismatch between host Docker and node containerd. Wrap kind binary so load failures are non-fatal warnings. Locally-built single-arch images (emailsender) still load normally. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bound the until loop to 60 retries (30 minutes) so failures surface earlier than the global job timeout. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

davdhacs · 2026-02-18T05:13:42Z

@porridge I tested/researched more, and it looks like the bug is really in kind itself. For the emailsender test, I threw together this workaround but I would be okay with alternative fixes for it.

porridge

This is a rather elaborate workaround, too "magic" for my taste, and I think the fix should go into the other repo instead.
It seems something like

would be a lot simpler...

.github/workflows/emailsender-central-compatibility.yaml

johannes94 · 2026-02-18T10:20:17Z

stackrox/acs-fleet-manager#2577

I tried kubernetes-sigs/kind#3795 (comment) and it appears to work for us in the acs-fleet-manager repo. Once that is merged, the test in this repo should automatickly pick it up.

We could still merge the other changes:

Replace n1hility/cancel-previous-runs@v3 (broken credentials) with native concurrency group

Add cache: false to setup-go (repo is checked out to stackrox/ subdir, setup-go can't find go.sum at workspace root)

davdhacs · 2026-02-18T14:27:33Z

@johannes94 that's great! Thank you. Here is the cleanup/update only.

sourcery-ai

Hey - I've found 1 issue

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `.github/workflows/emailsender-central-compatibility.yaml:22-24` </location>
<code_context>
       - 'central/auth/m2m/**'
       - '.github/workflows/emailsender-central-compatibility.yaml'

+concurrency:
+  group: ${{ github.workflow }}
+  cancel-in-progress: true
+
 jobs:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Concurrency group keyed only by workflow name may cause runs from different branches/PRs to cancel each other.

Using only `${{ github.workflow }}` here means all runs of this workflow across all branches/PRs share one concurrency group. With `cancel-in-progress: true`, a run on one branch/PR can cancel a run on another, which can disrupt parallel PR validation. Consider including branch/ref in the group, e.g. `${{ github.workflow }}-${{ github.ref }}` or `${{ github.workflow }}-${{ github.head_ref || github.ref }}` so only runs for the same branch/PR cancel each other.

```suggestion
concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
  cancel-in-progress: true
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

.github/workflows/emailsender-central-compatibility.yaml

openshift-ci · 2026-02-18T16:30:21Z

@davdhacs: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ocp-4-12-nongroovy-e2e-tests	`d42a41d`	link	false	`/test ocp-4-12-nongroovy-e2e-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

johannes94

Looks like the merge of stackrox/acs-fleet-manager#2577 fixed the test.

The remaining changes LGTM.

davdhacs requested a review from a team as a code owner February 17, 2026 01:34

github-actions bot added area/ci ai-review labels Feb 17, 2026

sourcery-ai bot reviewed Feb 17, 2026

View reviewed changes

davdhacs force-pushed the davdhacs/email-sender-latest branch 3 times, most recently from 4c669b6 to 4d85bc0 Compare February 17, 2026 05:25

empty commit

86ab11a

davdhacs force-pushed the davdhacs/email-sender-latest branch 3 times, most recently from 8a0de8f to bdb1934 Compare February 17, 2026 16:29

davdhacs changed the title ~~chore(ci): verify emailsender test works after rox-ci-image update~~ fix(ci): emailsender test (docker client upgrade broke kind test) Feb 17, 2026

davdhacs requested review from johannes94 and porridge February 17, 2026 16:31

davdhacs force-pushed the davdhacs/email-sender-latest branch from bdb1934 to 708a174 Compare February 17, 2026 17:38

davdhacs marked this pull request as draft February 17, 2026 17:47

openshift-ci bot added the do-not-merge/work-in-progress label Feb 17, 2026

davdhacs force-pushed the davdhacs/email-sender-latest branch from 708a174 to 950fd9a Compare February 17, 2026 20:59

fix(ci): inline secrets in ctr pull, matching #19016 pattern

efd38a5

Avoid intermediate QUAY_CREDS variable; pass secrets directly to --user like sensor integration tests do in #19016. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

davdhacs mentioned this pull request Feb 17, 2026

fix: pull platform-specific images in sensor integration tests #19016

Merged

davdhacs requested a review from janisz February 17, 2026 21:17

davdhacs marked this pull request as ready for review February 17, 2026 21:17

openshift-ci bot removed the do-not-merge/work-in-progress label Feb 17, 2026

sourcery-ai bot reviewed Feb 17, 2026

View reviewed changes

.github/workflows/emailsender-central-compatibility.yaml Show resolved Hide resolved

.github/workflows/emailsender-central-compatibility.yaml Outdated Show resolved Hide resolved

davdhacs changed the title ~~fix(ci): emailsender test (docker client upgrade broke kind test)~~ fix(ci): emailsender test (kind bug) Feb 17, 2026

davdhacs marked this pull request as draft February 17, 2026 22:51

openshift-ci bot added the do-not-merge/work-in-progress label Feb 17, 2026

fix(ci): add retry limit to image pre-load polling

1c06bdc

Bound the until loop to 60 retries (30 minutes) so failures surface earlier than the global job timeout. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

porridge reviewed Feb 18, 2026

View reviewed changes

.github/workflows/emailsender-central-compatibility.yaml Show resolved Hide resolved

.github/workflows/emailsender-central-compatibility.yaml Outdated Show resolved Hide resolved

johannes94 mentioned this pull request Feb 18, 2026

fix: workaround for emailsender kind image load stackrox/acs-fleet-manager#2577

Merged

9 tasks

davdhacs added 2 commits February 18, 2026 07:22

kind fix in acs-fleet-manager

f0a705f

Merge branch 'master' into davdhacs/email-sender-latest

d42a41d

davdhacs marked this pull request as ready for review February 18, 2026 14:26

openshift-ci bot removed the do-not-merge/work-in-progress label Feb 18, 2026

sourcery-ai bot reviewed Feb 18, 2026

View reviewed changes

.github/workflows/emailsender-central-compatibility.yaml Show resolved Hide resolved

johannes94 approved these changes Feb 18, 2026

View reviewed changes

davdhacs merged commit adc404b into master Feb 18, 2026
97 of 100 checks passed

davdhacs deleted the davdhacs/email-sender-latest branch February 18, 2026 20:00

Conversation

davdhacs commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing and quality

How I validated my change

Uh oh!

rhacs-bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davdhacs commented Feb 18, 2026

Uh oh!

porridge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

johannes94 commented Feb 18, 2026

Uh oh!

davdhacs commented Feb 18, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci bot commented Feb 18, 2026

Uh oh!

johannes94 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

davdhacs commented Feb 16, 2026 •

edited

Loading

rhacs-bot commented Feb 17, 2026 •

edited

Loading

codecov bot commented Feb 17, 2026 •

edited

Loading