Skip to content

perf(ci): trim stale GOCACHE entries between runs#19425

Open
davdhacs wants to merge 9 commits intomasterfrom
davdhacs/gocache-trim
Open

perf(ci): trim stale GOCACHE entries between runs#19425
davdhacs wants to merge 9 commits intomasterfrom
davdhacs/gocache-trim

Conversation

@davdhacs
Copy link
Contributor

@davdhacs davdhacs commented Mar 14, 2026

Summary

Trim stale GOCACHE entries and separate matrix variant caches to reduce CI cache sizes by 12-57%.

Problem

Go's build cache accumulates stale entries across CI runs. Each commit adds new entries for recompiled packages, but Go's built-in trim (5-day retention, 24h check interval) rarely fires in CI because the restored trim.txt suppresses it. This is a known gap in the Go CI caching ecosystem (actions/setup-go#395).

Additionally, build matrix variants (default/prerelease/race-condition-debug) were sharing one cache key, nearly tripling cache size.

Fix

Stale entry trimming: after restoring GOCACHE, backdate all entry mtimes to year 2000. Go's markUsed() updates accessed entries to "now" during the build. A gacts/run-and-post-run post-step deletes entries still at year 2000 before cache save. trim.txt is set to "now" to prevent Go's built-in Trim() from interfering.

Cache key separation: pass matrix.name as key-suffix for pre-build-go-binaries and build-and-push-operator.

Results

CI-measured cache sizes (compressed, from gh cache list). Seed run 23080987154, warm run 23081726167.

Job Master Trimmed Reduction
go (unit tests) 3034MB 2015MB 34%
go-postgres 2143MB 929MB 57%
go-bench 833MB 620MB 26%
sensor-integration 703MB 619MB 12%
local-roxctl-tests 1537MB 1168MB 24%

Cache sizes stable between runs (no growth). Trim post-step runs in 0-2s. No performance regression — warm run job timings match previous runs. Perf improvement of reduced cache download+extract time.

Partially generated by AI.

GOCACHE grows over time as each commit adds new entries for recompiled
packages while old entries persist. Go's built-in trim deletes entries
unused for 5 days, but only checks once per 24h — and since CI restores
the cache (including trim.txt) fresh each run, the trim rarely fires.

Fix: after restoring GOCACHE, backdate all entry mtimes to year 2000.
Go's markUsed() updates accessed entries to "now" during the build
(always fires since year-2000 mtime is >1h old). After the build, a
trim step deletes entries still at year 2000. The cache auto-save
post-step then saves only entries used by this build.

trim.txt must be set to "now" after backdating — otherwise Go's Trim()
sees "last trim in year 2000" and deletes all backdated entries before
the build starts.

Locally verified: 27% reduction (2026MB → 1531MB) with ~30 commits of
real source changes between cache save and restore.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link

openshift-ci bot commented Mar 14, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In the trim-go-cache composite action, consider guarding the du, find, and mktemp calls with a check that $(go env GOCACHE) is a non-empty existing directory (similar to the -d guard in cache-go-dependencies) to avoid accidentally operating on the wrong path when Go or GOCACHE is misconfigured.
  • Since trim-go-cache relies on cache-go-dependencies having run earlier in the job, it may be safer to make that dependency explicit (e.g., by checking for the presence of the year-2000 marker or trim.txt) and no-op otherwise, to avoid unexpected deletions if the action is reused in a job without the marking step.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the `trim-go-cache` composite action, consider guarding the `du`, `find`, and `mktemp` calls with a check that `$(go env GOCACHE)` is a non-empty existing directory (similar to the `-d` guard in `cache-go-dependencies`) to avoid accidentally operating on the wrong path when Go or GOCACHE is misconfigured.
- Since `trim-go-cache` relies on `cache-go-dependencies` having run earlier in the job, it may be safer to make that dependency explicit (e.g., by checking for the presence of the year-2000 marker or `trim.txt`) and no-op otherwise, to avoid unexpected deletions if the action is reused in a job without the marking step.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Mar 14, 2026

Images are ready for the commit at 8f84d09.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-329-g8f84d09b6c.

davdhacs and others added 3 commits March 14, 2026 00:00
build.yaml's pre-build-go-binaries and build-and-push-operator have
matrix variants (default, prerelease, race-condition-debug) that share
the same github.job cache key. Each variant writes its own entries into
one cache, causing it to grow to 3x the single-variant size (5.2GB on
master for pre-build-go-binaries).

Fix: pass matrix.name as key-suffix to cache-go-dependencies, giving
each variant its own cache key. Also adds key-suffix input to the cache
action and trim steps to all build.yaml Go jobs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address SourceryAI review feedback:
- Guard against missing/empty GOCACHE directory
- Check for trim.txt before trimming to ensure the mark step ran
- Show removed MB in output

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use gacts/run-and-post-run to register a post-step that trims stale
GOCACHE entries before the cache save post-step runs (GHA executes
post-steps in reverse registration order).

This eliminates the need for explicit trim-go-cache steps in every
workflow that uses cache-go-dependencies — the trim is now automatic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.71%. Comparing base (ac93c38) to head (8f84d09).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19425   +/-   ##
=======================================
  Coverage   49.71%   49.71%           
=======================================
  Files        2701     2701           
  Lines      203453   203453           
=======================================
+ Hits       101143   101148    +5     
+ Misses      94784    94781    -3     
+ Partials     7526     7524    -2     
Flag Coverage Δ
go-unit-tests 49.71% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

davdhacs and others added 3 commits March 14, 2026 09:23
- Remove extra blank lines left from earlier trim step removal
- Remove no-op `run:` from gacts/run-and-post-run (only `post:` needed)
- Add `set +e` to trim post-step so cache trim errors don't fail the job

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gacts/run-and-post-run executes each line separately. The if/then/fi
block with indented lines caused YAML > to preserve newlines, splitting
the script into fragments. Replace with single-line guard clause.

Also: set +e prevents trim errors from failing the job, and the exit 0
in the guard clause is safe because gacts processes lines independently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevent the mark step from hanging indefinitely on I/O delays.
If timeout fires, || true ensures the step doesn't fail — the trim
just won't remove entries that weren't backdated (safe, keeps more).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs marked this pull request as ready for review March 15, 2026 00:23
@davdhacs davdhacs requested a review from a team as a code owner March 15, 2026 00:23
@davdhacs davdhacs requested a review from janisz March 15, 2026 00:24
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In the post-step trim logic, consider reusing the previously computed ${{ steps.cache-paths.outputs.GOCACHE }} instead of calling go env GOCACHE again to avoid divergence if GOCACHE is overridden in the environment.
  • The find ... -exec touch -t 200001010000 over the entire GOCACHE could be expensive on very large caches despite the timeout 120; you might want to restrict it (e.g., skip files already older than the cutoff or make the timeout configurable) to avoid rare but pathological runs.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the post-step trim logic, consider reusing the previously computed `${{ steps.cache-paths.outputs.GOCACHE }}` instead of calling `go env GOCACHE` again to avoid divergence if `GOCACHE` is overridden in the environment.
- The `find ... -exec touch -t 200001010000` over the entire GOCACHE could be expensive on very large caches despite the `timeout 120`; you might want to restrict it (e.g., skip files already older than the cutoff or make the timeout configurable) to avoid rare but pathological runs.

## Individual Comments

### Comment 1
<location path=".github/actions/cache-go-dependencies/action.yaml" line_range="68" />
<code_context>
+          # Go's markUsed() updates mtimes of accessed entries to "now"
+          # (it always updates when mtime is >1 hour old). The post-step
+          # below trims entries still at year 2000 before cache save.
+          timeout 120 find "$gocache" -type f -exec touch -t 200001010000 {} + || true
+          # Protect trim.txt: if backdated, Go's built-in Trim() sees
+          # "last trim was in year 2000" and deletes ALL backdated entries
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `timeout` may break if this composite action is ever run on non-Linux runners.

Composite actions are often reused across workflows with different runners, and `timeout` is only guaranteed on Ubuntu (macOS has `gtimeout`, Windows lacks it). If this is Linux-only, please enforce or document that. Otherwise, consider a portable alternative (e.g., a manual `find` depth/limit strategy or guarding this block by `runner.os`).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

davdhacs and others added 2 commits March 14, 2026 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants