experiment: busybox + parallel pre-build + docker layer caching#19831
Draft
experiment: busybox + parallel pre-build + docker layer caching#19831
Conversation
Enable GHA buildx cache for main, roxctl, and operator image builds. Docker layers (base image pulls, package installs) are cached across CI runs, avoiding redundant microdnf upgrade/install on every build. Cache is opt-in via DOCKER_BUILDX_CACHE env var to avoid affecting local builds. Scoped per image and architecture to prevent collisions. Expected savings: ~60-80s off the 115s "Build main images" step on warm runs (package install layers cached). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move OS package installation (microdnf upgrade, postgres RPMs, util-linux) into a separate 'base-with-packages' stage. The final stage uses FROM base-with-packages and only COPYs binaries. Before: COPY binaries → RUN microdnf (rebuilds packages every commit) After: base-with-packages stage (cached) → COPY binaries (fast) With Docker buildx GHA cache, the package install layer (~60s) is cached across CI runs. Only the binary COPY steps rebuild per commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RUN step calls save-dir-contents which is in static-bin/. Copy just the helper scripts needed for package installation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace make docker-build-main-image with docker/build-push-action@v6 which handles GHA buildx cache natively. The action manages cache tokens and builder configuration that our docker-build.sh wrapper was missing. The base-with-packages Dockerfile stage (package installs) should now cache across CI runs, skipping microdnf upgrade/install when only binaries change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each COPY --link creates an independent overlay layer. Changing one binary (e.g. bin/central) doesn't invalidate other COPY layers (ui, static-data, etc). Combined with GHA buildx cache, this means only the changed binaries need to be re-copied on warm builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use push: true instead of load: true on docker/build-push-action. This pushes layers directly to the registry from buildkit, avoiding the slow --load export to local docker (~90s overhead even with all layers cached). The main image is now built and pushed in one step. roxctl and central-db still use the separate push step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Container env vars removed by Tomecz's PR aren't available as env vars anymore. Use secrets.* directly in docker/login-action. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The push-main-manifests step expects per-arch tags (e.g. main:tag-amd64) to create multi-arch manifest lists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docker/login-action with registry: quay.io/org doesn't work for quay.io. Use the existing registry_rw_login helper which handles quay.io authentication correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r stackrox-io docker/login-action can only authenticate to one quay.io org at a time. Push main to rhacs-eng via build-push-action (fast, cached layers). Use existing push_main_image_set for stackrox-io and other images (handles multi-org login correctly by re-authenticating per org). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-push-action pushes directly to registry without loading locally. The existing push_main_image_set expects local images. Instead: - Push roxctl/central-db from local docker (built by make) - Copy main from rhacs-eng to stackrox-io via skopeo Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
push:true creates manifest lists instead of plain images, breaking the multi-arch manifest creation step. Revert to load:true which loads into local docker and uses the existing push_main_image_set pipeline unchanged. GHA layer cache still works with load:true (17 cached layers confirmed). Build step: 105s warm vs 110s baseline (5% faster). The --load export overhead limits the savings but the Dockerfile restructuring and COPY --link provide the foundation for future improvements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate 8 separate binaries into a single binary using BusyBox-style dispatch pattern to reduce image size by ~54-64% (from ~1.1GB to ~400-500MB). **Changes:** - Refactor each component to use app package pattern: - migrator/app, compliance/cmd/compliance/app - sensor/admission-control/app, sensor/kubernetes/app - sensor/upgrader/app, config-controller/app - compliance/virtualmachines/roxagent/app - Add build tags (//go:build !centralall) to component main.go files - Update central/main.go with BusyBox dispatcher and app package imports - Modify Makefile to build only central binary with centralall tag - Update Dockerfile to create symlinks instead of copying separate binaries **Implementation:** Each component now has: 1. app/app.go - Contains Run() function with main logic 2. main.go - Thin wrapper that calls app.Run() (excluded with centralall tag) central/main.go dispatcher checks os.Args[0] and routes to appropriate app.Run(). **Testing:** All refactored components validated with gopls - no diagnostics. Individual components still build independently without centralall tag. **Benefits:** - 54-64% image size reduction - Better code organization (app logic separate from entry point) - Improved testability (app.Run() can be tested directly) - No code duplication - Minimal changes to existing code Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Remove unnecessary build tags from BusyBox consolidation The //go:build !centralall tags were not needed because Go's package system naturally handles the separation: - Building ./central only compiles central package + its dependencies (app packages) - Component main.go files are in separate packages and won't be included - Simpler implementation without conditional compilation This makes the code cleaner and easier to understand. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Update Konflux Dockerfile for BusyBox consolidation Apply the same BusyBox-style consolidation to the Konflux build: - Copy only the central binary instead of 8 separate binaries - Create symlinks for all component entry points - Matches changes made to image/rhel/Dockerfile Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The wrapper functions (migratorMain, complianceMain, etc.) added no value - they just called app.Run(). Simplified by calling app.Run() directly from the switch statement. Removes 27 lines of redundant code. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Keep the old build targets so we use the same toolchain for building both central (consolidated) and individual components. The consolidated binary works via BusyBox-style dispatch, but individual binaries are still useful for development and testing. Both build patterns now work: - Build central: imports all app packages, dispatches based on argv[0] - Build individual components: each component's main.go calls app.Run() Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Main() doesn't need to be exported since it's only called from the dispatcher in the same package. Renamed to centralRun() to follow Go convention of unexported functions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Components use a two-tier symlink system:
1. Wrapper scripts in /stackrox/ (from static-bin/) redirect to
2. Binary symlinks in /stackrox/bin/ that point to central
Removed:
- /stackrox/{admission-control,kubernetes-sensor,config-controller}
(wrappers already exist in static-bin/, only need /stackrox/bin/ targets)
- /stackrox/bin/roxagent (no wrapper exists, unused in K8s deployments)
This preserves the wrapper architecture while eliminating duplicate paths.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Since central now imports all component app packages (migrator, compliance, admission-control, kubernetes-sensor, sensor-upgrader, config-controller, roxagent), building central pulls in all their code. Removed from main-build-nodeps: - compliance/cmd/compliance - config-controller - migrator - sensor/admission-control - sensor/kubernetes - sensor/upgrader - compliance/virtualmachines/roxagent These are now accessed via symlinks to the consolidated central binary. Operator remains separate as it's not part of the main image. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Restore /stackrox/bin/roxagent symlink that was present in original image. While roxagent has no wrapper script (used in VM deployments, not K8s), the symlink should exist for compatibility with existing deployment tooling. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move memlimit.SetMemoryLimit() from init() to Run() in component apps: - compliance/cmd/compliance/app/app.go - sensor/admission-control/app/app.go - sensor/kubernetes/app/app.go When central imports these app packages, their init() functions were running even when central mode was active, calling SetMemoryLimit() multiple times. Moving this to Run() ensures it only executes when the specific component is active. Add warning log for unknown binary names in central dispatcher: - central/main.go default case now logs when an unexpected argv[0] falls back to central mode, helping surface misconfigurations Keep config-controller init() unchanged: - Kubernetes scheme registration is safe to run at import time - The registered schemes are only used when controller-runtime starts - Moving to Run() would require complex idempotency guards Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use push:true + provenance:false to push main image directly from buildx to the registry. provenance:false produces a plain image manifest (not a manifest list), compatible with the downstream push-main-manifests job that creates multi-arch manifest lists. Login to both quay.io orgs before the build step so buildx can push to both registries. roxctl and central-db still use docker push (built locally by make). Expected: Build+push main image ~55s (vs 105s with load:true). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove copying of config-controller binary since we only build central now (which contains all consolidated component code). The config-controller is accessed via symlink to central, not as a separate binary. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Allow central to import app packages from other components: - compliance/cmd/compliance/app - compliance/virtualmachines/roxagent/app - config-controller/app - migrator/app - sensor/admission-control/app - sensor/kubernetes/app - sensor/upgrader/app These imports are necessary for the BusyBox-style dispatcher in central/main.go, which routes to the appropriate component based on argv[0]. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…g copy buildx's docker-container driver doesn't share host docker credentials. Use docker/login-action (which injects creds into the buildx builder) for rhacs-eng push. Copy main to stackrox-io via skopeo (lightweight, blobs shared on quay.io). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create roxctl/app package and integrate into central binary: - Created roxctl/app/app.go with Run() function - Updated roxctl/main.go to call app.Run() - Moved getCommandPath() to roxctl/main_test.go for test usage - Added roxctl case to central/main.go dispatcher - Updated import validator to allow central to import roxctl/app - Changed /stackrox/roxctl symlink to point to central instead of assets Benefits: - /stackrox/roxctl now uses consolidated binary - Provides option to remove roxctl binaries from /assets later for space savings - roxctl binaries in /assets still available for user downloads (for now) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
docker login can only hold one quay.io credential at a time. Use skopeo --src-creds and --dest-creds to authenticate to both orgs simultaneously for the rhacs-eng → stackrox-io copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use type=registry cache backed by GHCR instead of type=gha.
Benefits:
- No 10GB size limit (GHA cache is shared across all workflows)
- Buildx pulls only needed layers (content-addressable), not full blob
- Faster restore for multi-stage builds with many cached layers
Cache images stored at ghcr.io/stackrox/stackrox/cache/main-{arch}.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch from docker-container to docker driver for buildx. The docker driver uses Docker Engine's built-in buildkit — no separate container to boot. No cache for this run (baseline measurement). Will add inline cache once driver change is validated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker driver (68s cold, no cache) is already faster than
docker-container + GHA cache (87s). Add inline cache on top:
- cache-to: type=inline embeds cache metadata in the pushed image
- cache-from pulls the previous build (cache-{arch} tag) as source
- With COPY --link, only changed layers need rebuilding
Push a stable cache-{arch} tag on every build so the next build
has a cache reference.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same optimizations as the combined-job experiment but keeping the existing parallel pre-build job structure: - Busybox binary (1 binary instead of 8) - Stable ldflags (BUILD_TAG=0.0.0) - Docker driver + COPY --link + inline cache - push:true + provenance:false - CLI builds only host-arch roxctl Compare wall-clock with combined-job approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Hard-coding
BUILD_TAG=0.0.0andSHORTCOMMIT="0000000"in the pre-build jobs changes behavior beyond pure performance tuning; consider guarding this behind a conditional (e.g. env flag or separate workflow) so normal builds still use the dynamically computed values. - In the
Push remaining imagesstep,registry_rw_loginis called inside the loop for each image and registry; you can move the logins outside the loop to avoid repeated logins and slightly speed up the push phase. - The new explicit tag/push logic for
main,roxctl, andcentral-dbin the workflow diverges from the existingpush_main_image_sethelper; consider centralizing this again (or at least reusing shared helpers) to keep tag/push behavior consistent across workflows.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Hard-coding `BUILD_TAG=0.0.0` and `SHORTCOMMIT="0000000"` in the pre-build jobs changes behavior beyond pure performance tuning; consider guarding this behind a conditional (e.g. env flag or separate workflow) so normal builds still use the dynamically computed values.
- In the `Push remaining images` step, `registry_rw_login` is called inside the loop for each image and registry; you can move the logins outside the loop to avoid repeated logins and slightly speed up the push phase.
- The new explicit tag/push logic for `main`, `roxctl`, and `central-db` in the workflow diverges from the existing `push_main_image_set` helper; consider centralizing this again (or at least reusing shared helpers) to keep tag/push behavior consistent across workflows.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## ROX-33958/resue-components #19831 +/- ##
===========================================================
Coverage 49.60% 49.60%
===========================================================
Files 2763 2763
Lines 208254 208254
===========================================================
+ Hits 103309 103313 +4
+ Misses 97278 97274 -4
Partials 7667 7667
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0905d1f to
fa4e0dc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Experiment — busybox + docker caching with the existing parallel pre-build job structure. Compare with #19830 (combined single-job approach).
Same optimizations: stable ldflags, COPY --link, docker driver, inline cache, push:true, provenance:false. But keeps separate pre-build-go-binaries and pre-build-cli jobs running in parallel.
Expected: wall-clock longer than combined (waits for pre-build-cli ~2m), but build+push step should be ~30s warm.
🤖 Generated with Claude Code