perf(ci): trim stale GOCACHE entries between runs#19425
perf(ci): trim stale GOCACHE entries between runs#19425
Conversation
GOCACHE grows over time as each commit adds new entries for recompiled packages while old entries persist. Go's built-in trim deletes entries unused for 5 days, but only checks once per 24h — and since CI restores the cache (including trim.txt) fresh each run, the trim rarely fires. Fix: after restoring GOCACHE, backdate all entry mtimes to year 2000. Go's markUsed() updates accessed entries to "now" during the build (always fires since year-2000 mtime is >1h old). After the build, a trim step deletes entries still at year 2000. The cache auto-save post-step then saves only entries used by this build. trim.txt must be set to "now" after backdating — otherwise Go's Trim() sees "last trim in year 2000" and deletes all backdated entries before the build starts. Locally verified: 27% reduction (2026MB → 1531MB) with ~30 commits of real source changes between cache save and restore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In the
trim-go-cachecomposite action, consider guarding thedu,find, andmktempcalls with a check that$(go env GOCACHE)is a non-empty existing directory (similar to the-dguard incache-go-dependencies) to avoid accidentally operating on the wrong path when Go or GOCACHE is misconfigured. - Since
trim-go-cacherelies oncache-go-dependencieshaving run earlier in the job, it may be safer to make that dependency explicit (e.g., by checking for the presence of the year-2000 marker ortrim.txt) and no-op otherwise, to avoid unexpected deletions if the action is reused in a job without the marking step.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the `trim-go-cache` composite action, consider guarding the `du`, `find`, and `mktemp` calls with a check that `$(go env GOCACHE)` is a non-empty existing directory (similar to the `-d` guard in `cache-go-dependencies`) to avoid accidentally operating on the wrong path when Go or GOCACHE is misconfigured.
- Since `trim-go-cache` relies on `cache-go-dependencies` having run earlier in the job, it may be safer to make that dependency explicit (e.g., by checking for the presence of the year-2000 marker or `trim.txt`) and no-op otherwise, to avoid unexpected deletions if the action is reused in a job without the marking step.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Images are ready for the commit at 8f84d09. To use with deploy scripts, first |
build.yaml's pre-build-go-binaries and build-and-push-operator have matrix variants (default, prerelease, race-condition-debug) that share the same github.job cache key. Each variant writes its own entries into one cache, causing it to grow to 3x the single-variant size (5.2GB on master for pre-build-go-binaries). Fix: pass matrix.name as key-suffix to cache-go-dependencies, giving each variant its own cache key. Also adds key-suffix input to the cache action and trim steps to all build.yaml Go jobs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address SourceryAI review feedback: - Guard against missing/empty GOCACHE directory - Check for trim.txt before trimming to ensure the mark step ran - Show removed MB in output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use gacts/run-and-post-run to register a post-step that trims stale GOCACHE entries before the cache save post-step runs (GHA executes post-steps in reverse registration order). This eliminates the need for explicit trim-go-cache steps in every workflow that uses cache-go-dependencies — the trim is now automatic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19425 +/- ##
=======================================
Coverage 49.71% 49.71%
=======================================
Files 2701 2701
Lines 203453 203453
=======================================
+ Hits 101143 101148 +5
+ Misses 94784 94781 -3
+ Partials 7526 7524 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Remove extra blank lines left from earlier trim step removal - Remove no-op `run:` from gacts/run-and-post-run (only `post:` needed) - Add `set +e` to trim post-step so cache trim errors don't fail the job Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gacts/run-and-post-run executes each line separately. The if/then/fi block with indented lines caused YAML > to preserve newlines, splitting the script into fragments. Replace with single-line guard clause. Also: set +e prevents trim errors from failing the job, and the exit 0 in the guard clause is safe because gacts processes lines independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevent the mark step from hanging indefinitely on I/O delays. If timeout fires, || true ensures the step doesn't fail — the trim just won't remove entries that weren't backdated (safe, keeps more). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In the post-step trim logic, consider reusing the previously computed
${{ steps.cache-paths.outputs.GOCACHE }}instead of callinggo env GOCACHEagain to avoid divergence ifGOCACHEis overridden in the environment. - The
find ... -exec touch -t 200001010000over the entire GOCACHE could be expensive on very large caches despite thetimeout 120; you might want to restrict it (e.g., skip files already older than the cutoff or make the timeout configurable) to avoid rare but pathological runs.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In the post-step trim logic, consider reusing the previously computed `${{ steps.cache-paths.outputs.GOCACHE }}` instead of calling `go env GOCACHE` again to avoid divergence if `GOCACHE` is overridden in the environment.
- The `find ... -exec touch -t 200001010000` over the entire GOCACHE could be expensive on very large caches despite the `timeout 120`; you might want to restrict it (e.g., skip files already older than the cutoff or make the timeout configurable) to avoid rare but pathological runs.
## Individual Comments
### Comment 1
<location path=".github/actions/cache-go-dependencies/action.yaml" line_range="68" />
<code_context>
+ # Go's markUsed() updates mtimes of accessed entries to "now"
+ # (it always updates when mtime is >1 hour old). The post-step
+ # below trims entries still at year 2000 before cache save.
+ timeout 120 find "$gocache" -type f -exec touch -t 200001010000 {} + || true
+ # Protect trim.txt: if backdated, Go's built-in Trim() sees
+ # "last trim was in year 2000" and deletes ALL backdated entries
</code_context>
<issue_to_address>
**issue (bug_risk):** Using `timeout` may break if this composite action is ever run on non-Linux runners.
Composite actions are often reused across workflows with different runners, and `timeout` is only guaranteed on Ubuntu (macOS has `gtimeout`, Windows lacks it). If this is Linux-only, please enforce or document that. Otherwise, consider a portable alternative (e.g., a manual `find` depth/limit strategy or guarding this block by `runner.os`).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Trim stale GOCACHE entries and separate matrix variant caches to reduce CI cache sizes by 12-57%.
Problem
Go's build cache accumulates stale entries across CI runs. Each commit adds new entries for recompiled packages, but Go's built-in trim (5-day retention, 24h check interval) rarely fires in CI because the restored
trim.txtsuppresses it. This is a known gap in the Go CI caching ecosystem (actions/setup-go#395).Additionally, build matrix variants (default/prerelease/race-condition-debug) were sharing one cache key, nearly tripling cache size.
Fix
Stale entry trimming: after restoring GOCACHE, backdate all entry mtimes to year 2000. Go's
markUsed()updates accessed entries to "now" during the build. Agacts/run-and-post-runpost-step deletes entries still at year 2000 before cache save.trim.txtis set to "now" to prevent Go's built-inTrim()from interfering.Cache key separation: pass
matrix.nameaskey-suffixforpre-build-go-binariesandbuild-and-push-operator.Results
CI-measured cache sizes (compressed, from
gh cache list). Seed run 23080987154, warm run 23081726167.Cache sizes stable between runs (no growth). Trim post-step runs in 0-2s. No performance regression — warm run job timings match previous runs. Perf improvement of reduced cache download+extract time.
Partially generated by AI.