Skip to content

ROX-32976: Replace Docker Hub images with quay.io to avoid rate limits#18867

Merged
janisz merged 6 commits intomasterfrom
fix/sensor-docker-hub-rate-limit
Feb 10, 2026
Merged

ROX-32976: Replace Docker Hub images with quay.io to avoid rate limits#18867
janisz merged 6 commits intomasterfrom
fix/sensor-docker-hub-rate-limit

Conversation

@janisz
Copy link
Contributor

@janisz janisz commented Feb 5, 2026

Problem:
Sensor integration test Test_SensorIntermediateRuntimeEvents (ROX-32976) fails in CI due to Docker Hub anonymous rate limits (100 pulls/6hr per IP). Tests deploy pods with nginx:1.14.x, alpine/curl, and busybox from Docker Hub without authentication, causing ImagePullBackOff errors. Pods never become Ready, causing GetContainerIdsFromDeployment() to timeout.

Solution:

  • Replace all Docker Hub images with existing quay.io/rhacs-eng mirrors
  • Add quay.io login step in GitHub Actions workflow before Kind cluster
  • Use nginx-1.21.1 (available on quay.io) instead of nginx:1.14.x
  • Replace alpine/curl with alpine:3.16.0 and use wget (already available)
  • Update runtime test to expect "wget" process instead of "curl"

All images used are already mirrored and available on quay.io, eliminating dependency on Docker Hub and avoiding rate limit issues entirely.

User request: Analyze CI failure ROX-32976 and find root cause and permanent solution for Docker Hub rate limiting in sensor integration tests.

Code changes developed with AI assistance.

Problem:
Sensor integration test Test_SensorIntermediateRuntimeEvents (ROX-32976)
fails in CI due to Docker Hub anonymous rate limits (100 pulls/6hr per IP).
Tests deploy pods with nginx:1.14.x, alpine/curl, and busybox from Docker
Hub without authentication, causing ImagePullBackOff errors. Pods never
become Ready, causing GetContainerIdsFromDeployment() to timeout.

Solution:
- Replace all Docker Hub images with existing quay.io/rhacs-eng mirrors
- Add quay.io login step in GitHub Actions workflow before Kind cluster
- Use nginx-1.21.1 (available on quay.io) instead of nginx:1.14.x
- Replace alpine/curl with alpine:3.16.0 and use wget (already available)
- Update runtime test to expect "wget" process instead of "curl"

All images used are already mirrored and available on quay.io, eliminating
dependency on Docker Hub and avoiding rate limit issues entirely.

User request: Analyze CI failure ROX-32976 and find root cause and permanent
solution for Docker Hub rate limiting in sensor integration tests.

Code changes developed with AI assistance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The runtime policy name and expected process have been switched from curl to wget in the Go test, but sensor/tests/data/runtime-policies.json is listed in the diff without visible changes—please confirm any policy references (names, process filters) are updated there to stay consistent with the new behavior.
  • You now hardcode quay.io/rhacs-eng/qa-multi-arch:nginx-1.21.1 and quay.io/rhacs-eng/qa:alpine-3.16.0 in many YAMLs; consider centralizing these image definitions (e.g., via a shared helper, Kustomize patch, or common values file) to make future image changes less error-prone.
  • The policy and test identifiers were renamed from test-pi-curl to test-pi-wget; if the intent is to test a generic outbound HTTP client, consider a tool-agnostic name (e.g., test-pi-http-client) to avoid future renames when switching binaries again.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The runtime policy name and expected process have been switched from curl to wget in the Go test, but `sensor/tests/data/runtime-policies.json` is listed in the diff without visible changes—please confirm any policy references (names, process filters) are updated there to stay consistent with the new behavior.
- You now hardcode `quay.io/rhacs-eng/qa-multi-arch:nginx-1.21.1` and `quay.io/rhacs-eng/qa:alpine-3.16.0` in many YAMLs; consider centralizing these image definitions (e.g., via a shared helper, Kustomize patch, or common values file) to make future image changes less error-prone.
- The policy and test identifiers were renamed from `test-pi-curl` to `test-pi-wget`; if the intent is to test a generic outbound HTTP client, consider a tool-agnostic name (e.g., `test-pi-http-client`) to avoid future renames when switching binaries again.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Feb 5, 2026

Images are ready for the commit at 0cbe19f.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-63-g0cbe19fac2.

Kind clusters don't inherit Docker credentials by default, so images need
to be explicitly loaded into the cluster after pulling them.

This adds a step to pull the test images from quay.io (using authenticated
Docker session) and load them into the Kind cluster using 'kind load docker-image'.

This ensures pods can access the images without needing to pull from the
registry directly.
Add image prefetcher infrastructure for sensor integration tests to support
both GitHub Actions (Kind) and OpenShift CI (real clusters).

Changes:
- Create sensor/tests/images-to-prefetch.txt as single source of truth for
  test image list (follows pattern from E2E tests)
- Update scripts/ci/lib.sh with sensor-integration cases in:
  - populate_prefetcher_image_list()
  - _image_prefetcher_prebuilt_start()
  - _image_prefetcher_prebuilt_await()
- Update tests/e2e/sensor.sh to call image prefetch functions for OSCI
- Update GitHub Actions workflow to read images from prefetch list instead
  of hardcoding them

For GitHub Actions with Kind:
- Reads sensor/tests/images-to-prefetch.txt
- Pulls images from quay.io (authenticated)
- Loads images into Kind cluster via 'kind load docker-image'

For OpenShift CI:
- Uses DaemonSet-based prefetcher to pull images to all nodes
- Sets IMAGE_PULL_POLICY_FOR_QUAY_IO=Never to use prefetched images

This ensures tests work in both environments without Docker Hub rate limits.
@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.47%. Comparing base (9cbfb66) to head (0cbe19f).
⚠️ Report is 13 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #18867   +/-   ##
=======================================
  Coverage   49.47%   49.47%           
=======================================
  Files        2661     2661           
  Lines      200734   200734           
=======================================
+ Hits        99307    99313    +6     
+ Misses      94011    94007    -4     
+ Partials     7416     7414    -2     
Flag Coverage Δ
go-unit-tests 49.47% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The helm/kind-action creates a cluster named 'chart-testing' by default,
but 'kind load docker-image' was looking for the default cluster name 'kind'.

Add --name chart-testing to the kind load command to fix the error:
'ERROR: no nodes found for cluster "kind"'
The pod hierarchy tests were checking for the old Docker Hub image names
(nginx:1.14.2 and nginx:1.14.1) but the YAMLs now use quay.io images.

Update test assertions in pod_test.go to expect:
- quay.io/rhacs-eng/qa-multi-arch:nginx-1.21.1

This fixes the test failures:
- Test_ContainerSpecOnDeployment
- Test_ParentlessPodsAreTreatedAsDeployments
@janisz janisz marked this pull request as ready for review February 6, 2026 15:53
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The new sensor/tests/images-to-prefetch.txt is now the implicit source of truth for quay.io images used in sensor integration tests, but nothing enforces that all images referenced in the sensor test YAMLs are listed there; consider adding a small validation script/check in CI that scans the sensor test manifests for quay.io images and ensures they are present in this prefetch list to avoid future drift and unexpected ImagePullBackOffs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new sensor/tests/images-to-prefetch.txt is now the implicit source of truth for quay.io images used in sensor integration tests, but nothing enforces that all images referenced in the sensor test YAMLs are listed there; consider adding a small validation script/check in CI that scans the sensor test manifests for quay.io images and ensures they are present in this prefetch list to avoid future drift and unexpected ImagePullBackOffs.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@janisz janisz added the auto-retest PRs with this label will be automatically retested if prow checks fails label Feb 9, 2026
@janisz
Copy link
Contributor Author

janisz commented Feb 9, 2026

/retest

Copy link
Contributor

@lvalerom lvalerom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@janisz janisz merged commit a7848e2 into master Feb 10, 2026
108 checks passed
@janisz janisz deleted the fix/sensor-docker-hub-rate-limit branch February 10, 2026 13:10
stehessel added a commit that referenced this pull request Feb 13, 2026
janisz added a commit that referenced this pull request Feb 13, 2026
User request: Fix multi-arch image loading issue in PR #18867 where Kind
fails to load images with error "content digest not found" for arm64 layers.

Problem: When docker pull fetches multi-arch images on amd64 runners, it
downloads the manifest index but only pulls amd64 layers. Kind's import with
--all-platforms then fails because arm64 layers weren't downloaded.

Solution: Add --platform linux/amd64 flag to docker pull to fetch only the
platform-specific variant without the multi-arch manifest index. This ensures
consistency between pulled layers and what Kind attempts to import.

Code changes developed with AI assistance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
janisz added a commit that referenced this pull request Feb 13, 2026
User request: Fix multi-arch image loading issue in PR #18867 where Kind
fails to load images with error "content digest not found" for arm64 layers.

Problem: When docker pull fetches multi-arch images, even with --platform
flag, docker save with a tag includes the manifest index. Kind then tries
to import with --all-platforms but fails because only amd64 layers were
pulled, not arm64 layers referenced in the manifest.

Solution: Use docker save with the image ID instead of the tag. Image IDs
reference the platform-specific image directly, excluding the manifest index.
Then load via kind load image-archive and retag inside the Kind node.

This approach:
1. Pulls platform-specific image (linux/amd64)
2. Saves using image ID (no manifest index)
3. Loads archive into Kind
4. Retags with original name inside Kind node

Code changes developed with AI assistance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
janisz added a commit that referenced this pull request Feb 13, 2026
User request: Fix multi-arch image loading issue in PR #18867 where Kind
fails to load images with error "content digest not found" for arm64 layers.

Problem: kind load docker-image uses docker save and ctr images import
--all-platforms internally. When docker pull fetches multi-arch images, even
with --platform flag, docker save includes manifest index references. The ctr
import then fails because arm64 layers referenced in the manifest aren't present.

Solution: Bypass "kind load docker-image" entirely by pulling images directly
inside the Kind node using ctr. This avoids the docker save/import dance and
the --all-platforms issue.

Approach:
1. Use docker exec to run ctr inside the Kind node
2. Pull with --platform linux/amd64 to get only the needed variant
3. Pass quay.io credentials via --user flag to ctr

This is a recommended workaround from kubernetes-sigs/kind issues #3795, #3845,
and #4066, which document this as a known issue with Docker 29 and containerd.

References:
- kubernetes-sigs/kind#4066
- kubernetes-sigs/kind#3845
- kubernetes-sigs/kind#3795

Code changes developed with AI assistance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
janisz added a commit that referenced this pull request Mar 5, 2026
Adds image prefetching support for compatibility tests to avoid Docker Hub
rate limits, following the same pattern as PR #18867 for sensor integration
tests.

Changes:
- scripts/ci/lib.sh: Add *compatibility-tests case to prefetcher start, await,
  and image list population functions
- tests/e2e/run-compatibility.sh: Call prefetcher before deployment setup

The existing tests/images-to-prefetch.txt already contains the nginx image
used by compatibility tests (quay.io/rhacs-eng/qa-multi-arch:nginx-1-17-1
in tests/tls_challenge_test.go).

User request: look at this PR #18867
we need to add prefetcher to gke-nongroovy-compatibility-tests

Code changes developed with AI assistance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
janisz added a commit that referenced this pull request Mar 5, 2026
Adds image prefetching support for compatibility tests to avoid Docker Hub
rate limits, following the same pattern as PR #18867 for sensor integration
tests.

Changes:
- scripts/ci/lib.sh: Add *compatibility-tests case to prefetcher start, await,
  and image list population functions
- tests/e2e/run-compatibility.sh: Call prefetcher before deployment setup
- tests/yamls/multi-container-pod.yaml: Change imagePullPolicy from
  IfNotPresent to Never to use prefetched images

The existing tests/images-to-prefetch.txt already contains the nginx and
ubuntu images used by compatibility tests. The YAML file change is necessary
because pods created directly from YAML don't go through the
createDeploymentViaAPI code that respects IMAGE_PULL_POLICY_FOR_QUAY_IO.

User request: look at this PR #18867
we need to add prefetcher to gke-nongroovy-compatibility-tests

Code changes developed with AI assistance.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-review area/ci area/sensor auto-retest PRs with this label will be automatically retested if prow checks fails

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants