experiment: busybox + parallel pre-build + docker layer caching#19831
Draft
davdhacs wants to merge 26 commits intoROX-33958/resue-componentsfrom
Draft
experiment: busybox + parallel pre-build + docker layer caching#19831davdhacs wants to merge 26 commits intoROX-33958/resue-componentsfrom
davdhacs wants to merge 26 commits intoROX-33958/resue-componentsfrom
Conversation
Enable GHA buildx cache for main, roxctl, and operator image builds. Docker layers (base image pulls, package installs) are cached across CI runs, avoiding redundant microdnf upgrade/install on every build. Cache is opt-in via DOCKER_BUILDX_CACHE env var to avoid affecting local builds. Scoped per image and architecture to prevent collisions. Expected savings: ~60-80s off the 115s "Build main images" step on warm runs (package install layers cached). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move OS package installation (microdnf upgrade, postgres RPMs, util-linux) into a separate 'base-with-packages' stage. The final stage uses FROM base-with-packages and only COPYs binaries. Before: COPY binaries → RUN microdnf (rebuilds packages every commit) After: base-with-packages stage (cached) → COPY binaries (fast) With Docker buildx GHA cache, the package install layer (~60s) is cached across CI runs. Only the binary COPY steps rebuild per commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RUN step calls save-dir-contents which is in static-bin/. Copy just the helper scripts needed for package installation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace make docker-build-main-image with docker/build-push-action@v6 which handles GHA buildx cache natively. The action manages cache tokens and builder configuration that our docker-build.sh wrapper was missing. The base-with-packages Dockerfile stage (package installs) should now cache across CI runs, skipping microdnf upgrade/install when only binaries change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each COPY --link creates an independent overlay layer. Changing one binary (e.g. bin/central) doesn't invalidate other COPY layers (ui, static-data, etc). Combined with GHA buildx cache, this means only the changed binaries need to be re-copied on warm builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use push: true instead of load: true on docker/build-push-action. This pushes layers directly to the registry from buildkit, avoiding the slow --load export to local docker (~90s overhead even with all layers cached). The main image is now built and pushed in one step. roxctl and central-db still use the separate push step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Container env vars removed by Tomecz's PR aren't available as env vars anymore. Use secrets.* directly in docker/login-action. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The push-main-manifests step expects per-arch tags (e.g. main:tag-amd64) to create multi-arch manifest lists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docker/login-action with registry: quay.io/org doesn't work for quay.io. Use the existing registry_rw_login helper which handles quay.io authentication correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r stackrox-io docker/login-action can only authenticate to one quay.io org at a time. Push main to rhacs-eng via build-push-action (fast, cached layers). Use existing push_main_image_set for stackrox-io and other images (handles multi-org login correctly by re-authenticating per org). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-push-action pushes directly to registry without loading locally. The existing push_main_image_set expects local images. Instead: - Push roxctl/central-db from local docker (built by make) - Copy main from rhacs-eng to stackrox-io via skopeo Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
push:true creates manifest lists instead of plain images, breaking the multi-arch manifest creation step. Revert to load:true which loads into local docker and uses the existing push_main_image_set pipeline unchanged. GHA layer cache still works with load:true (17 cached layers confirmed). Build step: 105s warm vs 110s baseline (5% faster). The --load export overhead limits the savings but the Dockerfile restructuring and COPY --link provide the foundation for future improvements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use push:true + provenance:false to push main image directly from buildx to the registry. provenance:false produces a plain image manifest (not a manifest list), compatible with the downstream push-main-manifests job that creates multi-arch manifest lists. Login to both quay.io orgs before the build step so buildx can push to both registries. roxctl and central-db still use docker push (built locally by make). Expected: Build+push main image ~55s (vs 105s with load:true). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g copy buildx's docker-container driver doesn't share host docker credentials. Use docker/login-action (which injects creds into the buildx builder) for rhacs-eng push. Copy main to stackrox-io via skopeo (lightweight, blobs shared on quay.io). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docker login can only hold one quay.io credential at a time. Use skopeo --src-creds and --dest-creds to authenticate to both orgs simultaneously for the rhacs-eng → stackrox-io copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use type=registry cache backed by GHCR instead of type=gha.
Benefits:
- No 10GB size limit (GHA cache is shared across all workflows)
- Buildx pulls only needed layers (content-addressable), not full blob
- Faster restore for multi-stage builds with many cached layers
Cache images stored at ghcr.io/stackrox/stackrox/cache/main-{arch}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch from docker-container to docker driver for buildx. The docker driver uses Docker Engine's built-in buildkit — no separate container to boot. No cache for this run (baseline measurement). Will add inline cache once driver change is validated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker driver (68s cold, no cache) is already faster than
docker-container + GHA cache (87s). Add inline cache on top:
- cache-to: type=inline embeds cache metadata in the pushed image
- cache-from pulls the previous build (cache-{arch} tag) as source
- With COPY --link, only changed layers need rebuilding
Push a stable cache-{arch} tag on every build so the next build
has a cache reference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same optimizations as the combined-job experiment but keeping the existing parallel pre-build job structure: - Busybox binary (1 binary instead of 8) - Stable ldflags (BUILD_TAG=0.0.0) - Docker driver + COPY --link + inline cache - push:true + provenance:false - CLI builds only host-arch roxctl Compare wall-clock with combined-job approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Hard-coding
BUILD_TAG=0.0.0andSHORTCOMMIT="0000000"in the pre-build jobs changes behavior beyond pure performance tuning; consider guarding this behind a conditional (e.g. env flag or separate workflow) so normal builds still use the dynamically computed values. - In the
Push remaining imagesstep,registry_rw_loginis called inside the loop for each image and registry; you can move the logins outside the loop to avoid repeated logins and slightly speed up the push phase. - The new explicit tag/push logic for
main,roxctl, andcentral-dbin the workflow diverges from the existingpush_main_image_sethelper; consider centralizing this again (or at least reusing shared helpers) to keep tag/push behavior consistent across workflows.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Hard-coding `BUILD_TAG=0.0.0` and `SHORTCOMMIT="0000000"` in the pre-build jobs changes behavior beyond pure performance tuning; consider guarding this behind a conditional (e.g. env flag or separate workflow) so normal builds still use the dynamically computed values.
- In the `Push remaining images` step, `registry_rw_login` is called inside the loop for each image and registry; you can move the logins outside the loop to avoid repeated logins and slightly speed up the push phase.
- The new explicit tag/push logic for `main`, `roxctl`, and `central-db` in the workflow diverges from the existing `push_main_image_set` helper; consider centralizing this again (or at least reusing shared helpers) to keep tag/push behavior consistent across workflows.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## ROX-33958/resue-components #19831 +/- ##
===========================================================
Coverage 49.60% 49.60%
===========================================================
Files 2763 2763
Lines 208254 208254
===========================================================
+ Hits 103309 103313 +4
+ Misses 97278 97274 -4
Partials 7667 7667
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Experiment — busybox + docker caching with the existing parallel pre-build job structure. Compare with #19830 (combined single-job approach).
Same optimizations: stable ldflags, COPY --link, docker driver, inline cache, push:true, provenance:false. But keeps separate pre-build-go-binaries and pre-build-cli jobs running in parallel.
Expected: wall-clock longer than combined (waits for pre-build-cli ~2m), but build+push step should be ~30s warm.
🤖 Generated with Claude Code