Skip to content

fix(ci): emailsender test (kind bug)#19050

Merged
davdhacs merged 7 commits intomasterfrom
davdhacs/email-sender-latest
Feb 18, 2026
Merged

fix(ci): emailsender test (kind bug)#19050
davdhacs merged 7 commits intomasterfrom
davdhacs/email-sender-latest

Conversation

@davdhacs
Copy link
Contributor

@davdhacs davdhacs commented Feb 16, 2026

Description

Update to emailsender workflow (cleanup, fix/workaround for kind bug moved to acs-fleet-manager).

Fixed in stackrox/acs-fleet-manager#2577 kind load docker-image fails when importing multi-arch images because it uses ctr import --all-platforms but Docker only pulls content for the local platform. This started failing when the rox-ci-image image docker client was upgraded.

See kubernetes-sigs/kind#3795

Changes:

  • Replace n1hility/cancel-previous-runs@v3 (broken credentials) with native concurrency group
  • Add cache: false to setup-go (repo is checked out to stackrox/ subdir, setup-go can't find go.sum at workspace root)

Testing and quality

  • [] CI results are inspected

How I validated my change

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Feb 17, 2026

Images are ready for the commit at d42a41d.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-144-gd42a41da0a.

@codecov
Copy link

codecov bot commented Feb 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.51%. Comparing base (a8dd21c) to head (d42a41d).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19050      +/-   ##
==========================================
- Coverage   49.51%   49.51%   -0.01%     
==========================================
  Files        2670     2670              
  Lines      201549   201549              
==========================================
- Hits        99803    99802       -1     
  Misses      94293    94293              
- Partials     7453     7454       +1     
Flag Coverage Δ
go-unit-tests 49.51% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Instead of disabling the Go modules cache entirely, consider using cache-dependency-path: stackrox/go.sum with actions/setup-go so you still benefit from caching while accommodating the stackrox/ subdirectory layout.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Instead of disabling the Go modules cache entirely, consider using `cache-dependency-path: stackrox/go.sum` with `actions/setup-go` so you still benefit from caching while accommodating the `stackrox/` subdirectory layout.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@davdhacs davdhacs force-pushed the davdhacs/email-sender-latest branch 3 times, most recently from 4c669b6 to 4d85bc0 Compare February 17, 2026 05:25
@davdhacs davdhacs force-pushed the davdhacs/email-sender-latest branch 3 times, most recently from 8a0de8f to bdb1934 Compare February 17, 2026 16:29
@davdhacs davdhacs changed the title chore(ci): verify emailsender test works after rox-ci-image update fix(ci): emailsender test (docker client upgrade broke kind test) Feb 17, 2026
@davdhacs davdhacs force-pushed the davdhacs/email-sender-latest branch from bdb1934 to 708a174 Compare February 17, 2026 17:38
@davdhacs davdhacs marked this pull request as draft February 17, 2026 17:47
kind load docker-image fails on multi-arch OCI images because
docker save produces an incomplete archive (index references both
platforms but only one platform's blobs are present), and kind's
ctr import --all-platforms fails on the missing blobs.

Pre-load main and central-db directly into the Kind node via
ctr pull --platform linux/amd64, so kind load finds them already
present and skips the broken docker save path.

Also replace broken n1hility/cancel-previous-runs with native
concurrency group, and disable setup-go cache (repo in subdir).

See kubernetes-sigs/kind#3795

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs force-pushed the davdhacs/email-sender-latest branch from 708a174 to 950fd9a Compare February 17, 2026 20:59
Avoid intermediate QUAY_CREDS variable; pass secrets directly
to --user like sensor integration tests do in #19016.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The until docker exec ... loop in the Pre-load multi-arch images into Kind step can run indefinitely on persistent failures; consider adding a maximum number of retries or an overall timeout to avoid hanging the workflow.
  • The pre-load step hardcodes --platform linux/amd64; if this workflow might ever run on non-amd64 runners, consider deriving the platform from the runner or a matrix variable to keep it portable.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `until docker exec ...` loop in the `Pre-load multi-arch images into Kind` step can run indefinitely on persistent failures; consider adding a maximum number of retries or an overall timeout to avoid hanging the workflow.
- The pre-load step hardcodes `--platform linux/amd64`; if this workflow might ever run on non-amd64 runners, consider deriving the platform from the runner or a matrix variable to keep it portable.

## Individual Comments

### Comment 1
<location> `.github/workflows/emailsender-central-compatibility.yaml:22-24` </location>
<code_context>
       - 'central/auth/m2m/**'
       - '.github/workflows/emailsender-central-compatibility.yaml'

+concurrency:
+  group: ${{ github.workflow }}
+  cancel-in-progress: true
+
 jobs:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Concurrency group should likely include the ref to avoid cross-branch cancellations

`group: ${{ github.workflow }}` will cancel in-progress runs for this workflow across all branches. If you only want to cancel previous runs for the same branch/PR (like the old `cancel-previous-runs` action), scope the group with the ref, e.g. `group: ${{ github.workflow }}-${{ github.ref }}` or `group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}` so cancellations stay per-branch instead of global.

```suggestion
concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
  cancel-in-progress: true
```
</issue_to_address>

### Comment 2
<location> `.github/workflows/emailsender-central-compatibility.yaml:77-79` </location>
<code_context>
+          # See https://github.com/kubernetes-sigs/kind/issues/3795
+          # Same approach as #19016 for sensor integration tests.
+          TAG=$(make --no-print-directory -C stackrox tag)
+          for img in quay.io/rhacs-eng/main:$TAG quay.io/rhacs-eng/central-db:$TAG; do
+            echo "Waiting for $img ..."
+            until docker exec kind-control-plane ctr -n k8s.io images pull \
+              --user "${{ secrets.QUAY_RHACS_ENG_RO_USERNAME }}:${{ secrets.QUAY_RHACS_ENG_RO_PASSWORD }}" \
+              --platform linux/amd64 \
</code_context>

<issue_to_address>
**issue (bug_risk):** Add a maximum retry or timeout around the image pull loop to avoid hanging the job indefinitely

This `until ...; do sleep 30; done` construct will loop indefinitely if the image never becomes available (e.g., bad tag or credentials), effectively hanging the workflow until the global job timeout. Please bound this with either a max retry counter or a time-based cutoff (for example, tracking a `retries` variable and exiting with non-zero once it exceeds a threshold) so failures surface earlier and more predictably.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@davdhacs davdhacs changed the title fix(ci): emailsender test (docker client upgrade broke kind test) fix(ci): emailsender test (kind bug) Feb 17, 2026
The pre-load via ctr pull works but kind load docker-image still
runs (called by acs-fleet-manager entrypoint) and fails due to
digest mismatch between host Docker and node containerd.

Wrap kind binary so load failures are non-fatal warnings.
Locally-built single-arch images (emailsender) still load normally.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs marked this pull request as draft February 17, 2026 22:51
Bound the until loop to 60 retries (30 minutes) so failures
surface earlier than the global job timeout.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs
Copy link
Contributor Author

@porridge I tested/researched more, and it looks like the bug is really in kind itself. For the emailsender test, I threw together this workaround but I would be okay with alternative fixes for it.

Copy link
Contributor

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather elaborate workaround, too "magic" for my taste, and I think the fix should go into the other repo instead.
It seems something like

would be a lot simpler...

@johannes94
Copy link
Contributor

stackrox/acs-fleet-manager#2577

I tried kubernetes-sigs/kind#3795 (comment) and it appears to work for us in the acs-fleet-manager repo. Once that is merged, the test in this repo should automatickly pick it up.

We could still merge the other changes:

  • Replace n1hility/cancel-previous-runs@v3 (broken credentials) with native concurrency group
  • Add cache: false to setup-go (repo is checked out to stackrox/ subdir, setup-go can't find go.sum at workspace root)

@davdhacs davdhacs marked this pull request as ready for review February 18, 2026 14:26
@davdhacs
Copy link
Contributor Author

@johannes94 that's great! Thank you. Here is the cleanup/update only.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `.github/workflows/emailsender-central-compatibility.yaml:22-24` </location>
<code_context>
       - 'central/auth/m2m/**'
       - '.github/workflows/emailsender-central-compatibility.yaml'

+concurrency:
+  group: ${{ github.workflow }}
+  cancel-in-progress: true
+
 jobs:
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Concurrency group keyed only by workflow name may cause runs from different branches/PRs to cancel each other.

Using only `${{ github.workflow }}` here means all runs of this workflow across all branches/PRs share one concurrency group. With `cancel-in-progress: true`, a run on one branch/PR can cancel a run on another, which can disrupt parallel PR validation. Consider including branch/ref in the group, e.g. `${{ github.workflow }}-${{ github.ref }}` or `${{ github.workflow }}-${{ github.head_ref || github.ref }}` so only runs for the same branch/PR cancel each other.

```suggestion
concurrency:
  group: ${{ github.workflow }}-${{ github.head_ref || github.ref }}
  cancel-in-progress: true
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@openshift-ci
Copy link

openshift-ci bot commented Feb 18, 2026

@davdhacs: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ocp-4-12-nongroovy-e2e-tests d42a41d link false /test ocp-4-12-nongroovy-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@johannes94 johannes94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the merge of stackrox/acs-fleet-manager#2577 fixed the test.

The remaining changes LGTM.

@davdhacs davdhacs merged commit adc404b into master Feb 18, 2026
97 of 100 checks passed
@davdhacs davdhacs deleted the davdhacs/email-sender-latest branch February 18, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants