Skip to content

experiment: busybox + parallel pre-build + docker layer caching#19831

Draft
davdhacs wants to merge 37 commits intomasterfrom
davdhacs/busybox-parallel-build
Draft

experiment: busybox + parallel pre-build + docker layer caching#19831
davdhacs wants to merge 37 commits intomasterfrom
davdhacs/busybox-parallel-build

Conversation

@davdhacs
Copy link
Copy Markdown
Contributor

@davdhacs davdhacs commented Apr 4, 2026

Description

Experiment — busybox + docker caching with the existing parallel pre-build job structure. Compare with #19830 (combined single-job approach).

Same optimizations: stable ldflags, COPY --link, docker driver, inline cache, push:true, provenance:false. But keeps separate pre-build-go-binaries and pre-build-cli jobs running in parallel.

Expected: wall-clock longer than combined (waits for pre-build-cli ~2m), but build+push step should be ~30s warm.

🤖 Generated with Claude Code

davdhacs and others added 30 commits April 2, 2026 16:47
Enable GHA buildx cache for main, roxctl, and operator image builds.
Docker layers (base image pulls, package installs) are cached across
CI runs, avoiding redundant microdnf upgrade/install on every build.

Cache is opt-in via DOCKER_BUILDX_CACHE env var to avoid affecting
local builds. Scoped per image and architecture to prevent collisions.

Expected savings: ~60-80s off the 115s "Build main images" step on
warm runs (package install layers cached).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move OS package installation (microdnf upgrade, postgres RPMs,
util-linux) into a separate 'base-with-packages' stage. The final
stage uses FROM base-with-packages and only COPYs binaries.

Before: COPY binaries → RUN microdnf (rebuilds packages every commit)
After:  base-with-packages stage (cached) → COPY binaries (fast)

With Docker buildx GHA cache, the package install layer (~60s) is
cached across CI runs. Only the binary COPY steps rebuild per commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The RUN step calls save-dir-contents which is in static-bin/.
Copy just the helper scripts needed for package installation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace make docker-build-main-image with docker/build-push-action@v6
which handles GHA buildx cache natively. The action manages cache
tokens and builder configuration that our docker-build.sh wrapper
was missing.

The base-with-packages Dockerfile stage (package installs) should
now cache across CI runs, skipping microdnf upgrade/install when
only binaries change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each COPY --link creates an independent overlay layer. Changing one
binary (e.g. bin/central) doesn't invalidate other COPY layers
(ui, static-data, etc). Combined with GHA buildx cache, this means
only the changed binaries need to be re-copied on warm builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use push: true instead of load: true on docker/build-push-action.
This pushes layers directly to the registry from buildkit, avoiding
the slow --load export to local docker (~90s overhead even with all
layers cached).

The main image is now built and pushed in one step. roxctl and
central-db still use the separate push step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Container env vars removed by Tomecz's PR aren't available as
env vars anymore. Use secrets.* directly in docker/login-action.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The push-main-manifests step expects per-arch tags (e.g. main:tag-amd64)
to create multi-arch manifest lists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docker/login-action with registry: quay.io/org doesn't work for
quay.io. Use the existing registry_rw_login helper which handles
quay.io authentication correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r stackrox-io

docker/login-action can only authenticate to one quay.io org at a time.
Push main to rhacs-eng via build-push-action (fast, cached layers).
Use existing push_main_image_set for stackrox-io and other images
(handles multi-org login correctly by re-authenticating per org).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-push-action pushes directly to registry without loading locally.
The existing push_main_image_set expects local images. Instead:
- Push roxctl/central-db from local docker (built by make)
- Copy main from rhacs-eng to stackrox-io via skopeo

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
push:true creates manifest lists instead of plain images, breaking
the multi-arch manifest creation step. Revert to load:true which
loads into local docker and uses the existing push_main_image_set
pipeline unchanged.

GHA layer cache still works with load:true (17 cached layers confirmed).
Build step: 105s warm vs 110s baseline (5% faster). The --load export
overhead limits the savings but the Dockerfile restructuring and
COPY --link provide the foundation for future improvements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate 8 separate binaries into a single binary using BusyBox-style
dispatch pattern to reduce image size by ~54-64% (from ~1.1GB to ~400-500MB).

**Changes:**
- Refactor each component to use app package pattern:
  - migrator/app, compliance/cmd/compliance/app
  - sensor/admission-control/app, sensor/kubernetes/app
  - sensor/upgrader/app, config-controller/app
  - compliance/virtualmachines/roxagent/app
- Add build tags (//go:build !centralall) to component main.go files
- Update central/main.go with BusyBox dispatcher and app package imports
- Modify Makefile to build only central binary with centralall tag
- Update Dockerfile to create symlinks instead of copying separate binaries

**Implementation:**
Each component now has:
1. app/app.go - Contains Run() function with main logic
2. main.go - Thin wrapper that calls app.Run() (excluded with centralall tag)

central/main.go dispatcher checks os.Args[0] and routes to appropriate app.Run().

**Testing:**
All refactored components validated with gopls - no diagnostics.
Individual components still build independently without centralall tag.

**Benefits:**
- 54-64% image size reduction
- Better code organization (app logic separate from entry point)
- Improved testability (app.Run() can be tested directly)
- No code duplication
- Minimal changes to existing code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Remove unnecessary build tags from BusyBox consolidation

The //go:build !centralall tags were not needed because Go's package system
naturally handles the separation:
- Building ./central only compiles central package + its dependencies (app packages)
- Component main.go files are in separate packages and won't be included
- Simpler implementation without conditional compilation

This makes the code cleaner and easier to understand.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Update Konflux Dockerfile for BusyBox consolidation

Apply the same BusyBox-style consolidation to the Konflux build:
- Copy only the central binary instead of 8 separate binaries
- Create symlinks for all component entry points
- Matches changes made to image/rhel/Dockerfile

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The wrapper functions (migratorMain, complianceMain, etc.) added no value -
they just called app.Run(). Simplified by calling app.Run() directly from
the switch statement.

Removes 27 lines of redundant code.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Keep the old build targets so we use the same toolchain for building both
central (consolidated) and individual components. The consolidated binary
works via BusyBox-style dispatch, but individual binaries are still useful
for development and testing.

Both build patterns now work:
- Build central: imports all app packages, dispatches based on argv[0]
- Build individual components: each component's main.go calls app.Run()

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Main() doesn't need to be exported since it's only called from the
dispatcher in the same package. Renamed to centralRun() to follow
Go convention of unexported functions.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Components use a two-tier symlink system:
1. Wrapper scripts in /stackrox/ (from static-bin/) redirect to
2. Binary symlinks in /stackrox/bin/ that point to central

Removed:
- /stackrox/{admission-control,kubernetes-sensor,config-controller}
  (wrappers already exist in static-bin/, only need /stackrox/bin/ targets)
- /stackrox/bin/roxagent (no wrapper exists, unused in K8s deployments)

This preserves the wrapper architecture while eliminating duplicate paths.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Since central now imports all component app packages (migrator, compliance,
admission-control, kubernetes-sensor, sensor-upgrader, config-controller,
roxagent), building central pulls in all their code.

Removed from main-build-nodeps:
- compliance/cmd/compliance
- config-controller
- migrator
- sensor/admission-control
- sensor/kubernetes
- sensor/upgrader
- compliance/virtualmachines/roxagent

These are now accessed via symlinks to the consolidated central binary.
Operator remains separate as it's not part of the main image.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Restore /stackrox/bin/roxagent symlink that was present in original image.
While roxagent has no wrapper script (used in VM deployments, not K8s),
the symlink should exist for compatibility with existing deployment tooling.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move memlimit.SetMemoryLimit() from init() to Run() in component apps:
- compliance/cmd/compliance/app/app.go
- sensor/admission-control/app/app.go
- sensor/kubernetes/app/app.go

When central imports these app packages, their init() functions were running
even when central mode was active, calling SetMemoryLimit() multiple times.
Moving this to Run() ensures it only executes when the specific component
is active.

Add warning log for unknown binary names in central dispatcher:
- central/main.go default case now logs when an unexpected argv[0] falls
  back to central mode, helping surface misconfigurations

Keep config-controller init() unchanged:
- Kubernetes scheme registration is safe to run at import time
- The registered schemes are only used when controller-runtime starts
- Moving to Run() would require complex idempotency guards

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use push:true + provenance:false to push main image directly from
buildx to the registry. provenance:false produces a plain image
manifest (not a manifest list), compatible with the downstream
push-main-manifests job that creates multi-arch manifest lists.

Login to both quay.io orgs before the build step so buildx can push
to both registries. roxctl and central-db still use docker push
(built locally by make).

Expected: Build+push main image ~55s (vs 105s with load:true).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove copying of config-controller binary since we only build central now
(which contains all consolidated component code). The config-controller
is accessed via symlink to central, not as a separate binary.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Allow central to import app packages from other components:
- compliance/cmd/compliance/app
- compliance/virtualmachines/roxagent/app
- config-controller/app
- migrator/app
- sensor/admission-control/app
- sensor/kubernetes/app
- sensor/upgrader/app

These imports are necessary for the BusyBox-style dispatcher in
central/main.go, which routes to the appropriate component based on argv[0].

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…g copy

buildx's docker-container driver doesn't share host docker credentials.
Use docker/login-action (which injects creds into the buildx builder)
for rhacs-eng push. Copy main to stackrox-io via skopeo (lightweight,
blobs shared on quay.io).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create roxctl/app package and integrate into central binary:
- Created roxctl/app/app.go with Run() function
- Updated roxctl/main.go to call app.Run()
- Moved getCommandPath() to roxctl/main_test.go for test usage
- Added roxctl case to central/main.go dispatcher
- Updated import validator to allow central to import roxctl/app
- Changed /stackrox/roxctl symlink to point to central instead of assets

Benefits:
- /stackrox/roxctl now uses consolidated binary
- Provides option to remove roxctl binaries from /assets later for space savings
- roxctl binaries in /assets still available for user downloads (for now)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
docker login can only hold one quay.io credential at a time. Use
skopeo --src-creds and --dest-creds to authenticate to both orgs
simultaneously for the rhacs-eng → stackrox-io copy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davdhacs and others added 6 commits April 4, 2026 07:57
Use type=registry cache backed by GHCR instead of type=gha.
Benefits:
- No 10GB size limit (GHA cache is shared across all workflows)
- Buildx pulls only needed layers (content-addressable), not full blob
- Faster restore for multi-stage builds with many cached layers

Cache images stored at ghcr.io/stackrox/stackrox/cache/main-{arch}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch from docker-container to docker driver for buildx. The docker
driver uses Docker Engine's built-in buildkit — no separate container
to boot.

No cache for this run (baseline measurement). Will add inline cache
once driver change is validated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docker driver (68s cold, no cache) is already faster than
docker-container + GHA cache (87s). Add inline cache on top:
- cache-to: type=inline embeds cache metadata in the pushed image
- cache-from pulls the previous build (cache-{arch} tag) as source
- With COPY --link, only changed layers need rebuilding

Push a stable cache-{arch} tag on every build so the next build
has a cache reference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same optimizations as the combined-job experiment but keeping the
existing parallel pre-build job structure:
- Busybox binary (1 binary instead of 8)
- Stable ldflags (BUILD_TAG=0.0.0)
- Docker driver + COPY --link + inline cache
- push:true + provenance:false
- CLI builds only host-arch roxctl

Compare wall-clock with combined-job approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 4, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Hard-coding BUILD_TAG=0.0.0 and SHORTCOMMIT="0000000" in the pre-build jobs changes behavior beyond pure performance tuning; consider guarding this behind a conditional (e.g. env flag or separate workflow) so normal builds still use the dynamically computed values.
  • In the Push remaining images step, registry_rw_login is called inside the loop for each image and registry; you can move the logins outside the loop to avoid repeated logins and slightly speed up the push phase.
  • The new explicit tag/push logic for main, roxctl, and central-db in the workflow diverges from the existing push_main_image_set helper; consider centralizing this again (or at least reusing shared helpers) to keep tag/push behavior consistent across workflows.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Hard-coding `BUILD_TAG=0.0.0` and `SHORTCOMMIT="0000000"` in the pre-build jobs changes behavior beyond pure performance tuning; consider guarding this behind a conditional (e.g. env flag or separate workflow) so normal builds still use the dynamically computed values.
- In the `Push remaining images` step, `registry_rw_login` is called inside the loop for each image and registry; you can move the logins outside the loop to avoid repeated logins and slightly speed up the push phase.
- The new explicit tag/push logic for `main`, `roxctl`, and `central-db` in the workflow diverges from the existing `push_main_image_set` helper; consider centralizing this again (or at least reusing shared helpers) to keep tag/push behavior consistent across workflows.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.60%. Comparing base (5d9f83e) to head (04b23aa).

Additional details and impacted files
@@                     Coverage Diff                     @@
##           ROX-33958/resue-components   #19831   +/-   ##
===========================================================
  Coverage                       49.60%   49.60%           
===========================================================
  Files                            2763     2763           
  Lines                          208254   208254           
===========================================================
+ Hits                           103309   103313    +4     
+ Misses                          97278    97274    -4     
  Partials                         7667     7667           
Flag Coverage Δ
go-unit-tests 49.60% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@janisz janisz force-pushed the ROX-33958/resue-components branch 2 times, most recently from 0905d1f to fa4e0dc Compare April 7, 2026 16:53
Base automatically changed from ROX-33958/resue-components to master April 8, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants