ROX-32976: Replace Docker Hub images with quay.io to avoid rate limits#18867
Merged
ROX-32976: Replace Docker Hub images with quay.io to avoid rate limits#18867
Conversation
Problem: Sensor integration test Test_SensorIntermediateRuntimeEvents (ROX-32976) fails in CI due to Docker Hub anonymous rate limits (100 pulls/6hr per IP). Tests deploy pods with nginx:1.14.x, alpine/curl, and busybox from Docker Hub without authentication, causing ImagePullBackOff errors. Pods never become Ready, causing GetContainerIdsFromDeployment() to timeout. Solution: - Replace all Docker Hub images with existing quay.io/rhacs-eng mirrors - Add quay.io login step in GitHub Actions workflow before Kind cluster - Use nginx-1.21.1 (available on quay.io) instead of nginx:1.14.x - Replace alpine/curl with alpine:3.16.0 and use wget (already available) - Update runtime test to expect "wget" process instead of "curl" All images used are already mirrored and available on quay.io, eliminating dependency on Docker Hub and avoiding rate limit issues entirely. User request: Analyze CI failure ROX-32976 and find root cause and permanent solution for Docker Hub rate limiting in sensor integration tests. Code changes developed with AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The runtime policy name and expected process have been switched from curl to wget in the Go test, but
sensor/tests/data/runtime-policies.jsonis listed in the diff without visible changes—please confirm any policy references (names, process filters) are updated there to stay consistent with the new behavior. - You now hardcode
quay.io/rhacs-eng/qa-multi-arch:nginx-1.21.1andquay.io/rhacs-eng/qa:alpine-3.16.0in many YAMLs; consider centralizing these image definitions (e.g., via a shared helper, Kustomize patch, or common values file) to make future image changes less error-prone. - The policy and test identifiers were renamed from
test-pi-curltotest-pi-wget; if the intent is to test a generic outbound HTTP client, consider a tool-agnostic name (e.g.,test-pi-http-client) to avoid future renames when switching binaries again.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The runtime policy name and expected process have been switched from curl to wget in the Go test, but `sensor/tests/data/runtime-policies.json` is listed in the diff without visible changes—please confirm any policy references (names, process filters) are updated there to stay consistent with the new behavior.
- You now hardcode `quay.io/rhacs-eng/qa-multi-arch:nginx-1.21.1` and `quay.io/rhacs-eng/qa:alpine-3.16.0` in many YAMLs; consider centralizing these image definitions (e.g., via a shared helper, Kustomize patch, or common values file) to make future image changes less error-prone.
- The policy and test identifiers were renamed from `test-pi-curl` to `test-pi-wget`; if the intent is to test a generic outbound HTTP client, consider a tool-agnostic name (e.g., `test-pi-http-client`) to avoid future renames when switching binaries again.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Contributor
|
Images are ready for the commit at 0cbe19f. To use with deploy scripts, first |
Kind clusters don't inherit Docker credentials by default, so images need to be explicitly loaded into the cluster after pulling them. This adds a step to pull the test images from quay.io (using authenticated Docker session) and load them into the Kind cluster using 'kind load docker-image'. This ensures pods can access the images without needing to pull from the registry directly.
Add image prefetcher infrastructure for sensor integration tests to support both GitHub Actions (Kind) and OpenShift CI (real clusters). Changes: - Create sensor/tests/images-to-prefetch.txt as single source of truth for test image list (follows pattern from E2E tests) - Update scripts/ci/lib.sh with sensor-integration cases in: - populate_prefetcher_image_list() - _image_prefetcher_prebuilt_start() - _image_prefetcher_prebuilt_await() - Update tests/e2e/sensor.sh to call image prefetch functions for OSCI - Update GitHub Actions workflow to read images from prefetch list instead of hardcoding them For GitHub Actions with Kind: - Reads sensor/tests/images-to-prefetch.txt - Pulls images from quay.io (authenticated) - Loads images into Kind cluster via 'kind load docker-image' For OpenShift CI: - Uses DaemonSet-based prefetcher to pull images to all nodes - Sets IMAGE_PULL_POLICY_FOR_QUAY_IO=Never to use prefetched images This ensures tests work in both environments without Docker Hub rate limits.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18867 +/- ##
=======================================
Coverage 49.47% 49.47%
=======================================
Files 2661 2661
Lines 200734 200734
=======================================
+ Hits 99307 99313 +6
+ Misses 94011 94007 -4
+ Partials 7416 7414 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The helm/kind-action creates a cluster named 'chart-testing' by default, but 'kind load docker-image' was looking for the default cluster name 'kind'. Add --name chart-testing to the kind load command to fix the error: 'ERROR: no nodes found for cluster "kind"'
The pod hierarchy tests were checking for the old Docker Hub image names (nginx:1.14.2 and nginx:1.14.1) but the YAMLs now use quay.io images. Update test assertions in pod_test.go to expect: - quay.io/rhacs-eng/qa-multi-arch:nginx-1.21.1 This fixes the test failures: - Test_ContainerSpecOnDeployment - Test_ParentlessPodsAreTreatedAsDeployments
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new sensor/tests/images-to-prefetch.txt is now the implicit source of truth for quay.io images used in sensor integration tests, but nothing enforces that all images referenced in the sensor test YAMLs are listed there; consider adding a small validation script/check in CI that scans the sensor test manifests for quay.io images and ensures they are present in this prefetch list to avoid future drift and unexpected ImagePullBackOffs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new sensor/tests/images-to-prefetch.txt is now the implicit source of truth for quay.io images used in sensor integration tests, but nothing enforces that all images referenced in the sensor test YAMLs are listed there; consider adding a small validation script/check in CI that scans the sensor test manifests for quay.io images and ensures they are present in this prefetch list to avoid future drift and unexpected ImagePullBackOffs.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Contributor
Author
|
/retest |
parametalol
approved these changes
Feb 9, 2026
stehessel
added a commit
that referenced
this pull request
Feb 13, 2026
9 tasks
janisz
added a commit
that referenced
this pull request
Feb 13, 2026
User request: Fix multi-arch image loading issue in PR #18867 where Kind fails to load images with error "content digest not found" for arm64 layers. Problem: When docker pull fetches multi-arch images on amd64 runners, it downloads the manifest index but only pulls amd64 layers. Kind's import with --all-platforms then fails because arm64 layers weren't downloaded. Solution: Add --platform linux/amd64 flag to docker pull to fetch only the platform-specific variant without the multi-arch manifest index. This ensures consistency between pulled layers and what Kind attempts to import. Code changes developed with AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
janisz
added a commit
that referenced
this pull request
Feb 13, 2026
User request: Fix multi-arch image loading issue in PR #18867 where Kind fails to load images with error "content digest not found" for arm64 layers. Problem: When docker pull fetches multi-arch images, even with --platform flag, docker save with a tag includes the manifest index. Kind then tries to import with --all-platforms but fails because only amd64 layers were pulled, not arm64 layers referenced in the manifest. Solution: Use docker save with the image ID instead of the tag. Image IDs reference the platform-specific image directly, excluding the manifest index. Then load via kind load image-archive and retag inside the Kind node. This approach: 1. Pulls platform-specific image (linux/amd64) 2. Saves using image ID (no manifest index) 3. Loads archive into Kind 4. Retags with original name inside Kind node Code changes developed with AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
janisz
added a commit
that referenced
this pull request
Feb 13, 2026
User request: Fix multi-arch image loading issue in PR #18867 where Kind fails to load images with error "content digest not found" for arm64 layers. Problem: kind load docker-image uses docker save and ctr images import --all-platforms internally. When docker pull fetches multi-arch images, even with --platform flag, docker save includes manifest index references. The ctr import then fails because arm64 layers referenced in the manifest aren't present. Solution: Bypass "kind load docker-image" entirely by pulling images directly inside the Kind node using ctr. This avoids the docker save/import dance and the --all-platforms issue. Approach: 1. Use docker exec to run ctr inside the Kind node 2. Pull with --platform linux/amd64 to get only the needed variant 3. Pass quay.io credentials via --user flag to ctr This is a recommended workaround from kubernetes-sigs/kind issues #3795, #3845, and #4066, which document this as a known issue with Docker 29 and containerd. References: - kubernetes-sigs/kind#4066 - kubernetes-sigs/kind#3845 - kubernetes-sigs/kind#3795 Code changes developed with AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
janisz
added a commit
that referenced
this pull request
Mar 5, 2026
Adds image prefetching support for compatibility tests to avoid Docker Hub rate limits, following the same pattern as PR #18867 for sensor integration tests. Changes: - scripts/ci/lib.sh: Add *compatibility-tests case to prefetcher start, await, and image list population functions - tests/e2e/run-compatibility.sh: Call prefetcher before deployment setup The existing tests/images-to-prefetch.txt already contains the nginx image used by compatibility tests (quay.io/rhacs-eng/qa-multi-arch:nginx-1-17-1 in tests/tls_challenge_test.go). User request: look at this PR #18867 we need to add prefetcher to gke-nongroovy-compatibility-tests Code changes developed with AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
9 tasks
janisz
added a commit
that referenced
this pull request
Mar 5, 2026
Adds image prefetching support for compatibility tests to avoid Docker Hub rate limits, following the same pattern as PR #18867 for sensor integration tests. Changes: - scripts/ci/lib.sh: Add *compatibility-tests case to prefetcher start, await, and image list population functions - tests/e2e/run-compatibility.sh: Call prefetcher before deployment setup - tests/yamls/multi-container-pod.yaml: Change imagePullPolicy from IfNotPresent to Never to use prefetched images The existing tests/images-to-prefetch.txt already contains the nginx and ubuntu images used by compatibility tests. The YAML file change is necessary because pods created directly from YAML don't go through the createDeploymentViaAPI code that respects IMAGE_PULL_POLICY_FOR_QUAY_IO. User request: look at this PR #18867 we need to add prefetcher to gke-nongroovy-compatibility-tests Code changes developed with AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem:
Sensor integration test Test_SensorIntermediateRuntimeEvents (ROX-32976) fails in CI due to Docker Hub anonymous rate limits (100 pulls/6hr per IP). Tests deploy pods with nginx:1.14.x, alpine/curl, and busybox from Docker Hub without authentication, causing ImagePullBackOff errors. Pods never become Ready, causing GetContainerIdsFromDeployment() to timeout.
Solution:
All images used are already mirrored and available on quay.io, eliminating dependency on Docker Hub and avoiding rate limit issues entirely.
User request: Analyze CI failure ROX-32976 and find root cause and permanent solution for Docker Hub rate limiting in sensor integration tests.
Code changes developed with AI assistance.