perf(ci): add VERSION file and build/test caching fixes#19424
perf(ci): add VERSION file and build/test caching fixes#19424
Conversation
Add a VERSION file containing the base version tag (4.11.x), following the same pattern as COLLECTOR_VERSION, SCANNER_VERSION, FACT_VERSION. - VERSION: contains "4.11.x" - make/env.mk: make tag falls back to VERSION when git describe fails - start-release.yml: verifies VERSION matches release, creates PR to update VERSION on master for the next minor release Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the //XDef: annotation + status.sh + -X ldflags mechanism with a generated pkg/version/internal/zversion.go file. For tests: uses the stable base tag from VERSION (e.g. "4.11.x"), keeping Go link ActionIDs stable across commits. This enables both the Go link cache (2x faster builds) and test result cache. For builds: uses the full git-describe format (e.g. "4.11.x-320-gabcdef") so release binaries have the correct version. Also adds -buildvcs=false to remove unused VCS stamping that Go 1.18+ silently enabled (no code reads debug.ReadBuildInfo). Verified locally: - Same ldflags (link cache hit): 2.4s for 8 binaries - Different ldflags (link cache miss): 5.2s (2.2x slower) Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go test cache has two phases: 1. Test binary ID (includes link ActionID with ldflags) — fixed by commit 2 2. File mtime validation — fixed here git checkout sets all file mtimes to "now", causing Phase 2 to miss. Fix: set all tracked file mtimes to a fixed date after checkout. Also adds key-suffix input to cache-go-dependencies to disambiguate GOTAGS matrix entries. github.job is the same for all matrix entries (e.g. "go" for both GOTAGS="" and GOTAGS=release). Without key-suffix, matrix entries overwrite each other's GOCACHE. Combined measured results (PR #19404): - go unit tests: 45m → 14m (3.3x) - go-postgres: 35m → 4m (8.7x) - sensor-integration: 21m → 5m (4x) - Test cache hit rate: 92% (1942/2108) Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The build cache statistics added to
go_buildrely ongo build -vline counts and a fixed/tmp/go-build-compiled-countfile, which can be both inaccurate (non-1:1 with compiled units) and racy if multiple builds run in parallel; consider using a unique temp file per invocation and/or gating this logging behind a flag or env var. - The
git ls-files | xargs touchstep in the composite action mutates mtimes for every tracked file, which could interfere with incremental build tooling or other languages in the same job; consider narrowing this to Go source/test files or to the directories actually used by Go.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The build cache statistics added to `go_build` rely on `go build -v` line counts and a fixed `/tmp/go-build-compiled-count` file, which can be both inaccurate (non-1:1 with compiled units) and racy if multiple builds run in parallel; consider using a unique temp file per invocation and/or gating this logging behind a flag or env var.
- The `git ls-files | xargs touch` step in the composite action mutates mtimes for every tracked file, which could interfere with incremental build tooling or other languages in the same job; consider narrowing this to Go source/test files or to the directories actually used by Go.
## Individual Comments
### Comment 1
<location path="scripts/go-tool.sh" line_range="108" />
<code_context>
- invoke_go_build -o "$output" "${dirs[@]}"
+ local total
+ total=$(go list -deps "${dirs[@]}" 2>/dev/null | wc -l | tr -d ' ')
+ local compiled_count=/tmp/go-build-compiled-count
+ invoke_go_build -o "$output" "${dirs[@]}" 2> >(tee >(wc -l | tr -d ' ' > "$compiled_count") >&2)
+ wait
</code_context>
<issue_to_address>
**issue (bug_risk):** Use a per-process or per-invocation temp file instead of a fixed path for compiled_count
A fixed `/tmp` path means concurrent runs of this script (or multiple `go_build` calls in one job) will race on the same `compiled_count` file, causing corrupted or incorrect counts. Please use a unique temp file per invocation (e.g., via `mktemp` or including `$$`/`$BASHPID` and job name in the filename) and clean it up afterward.
</issue_to_address>
### Comment 2
<location path="scripts/go-tool.sh" line_range="112-115" />
<code_context>
+ invoke_go_build -o "$output" "${dirs[@]}" 2> >(tee >(wc -l | tr -d ' ' > "$compiled_count") >&2)
+ wait
+ local compiled
+ compiled=$(cat "$compiled_count" 2>/dev/null | tr -d ' ')
+ local cached=$((total - compiled))
+ echo >&2 "Build cache: ${cached}/${total} deps cached ($compiled compiled)"
+ if [[ "$total" -gt 0 ]] && [[ $((cached * 100 / total)) -lt 50 ]]; then
+ echo >&2 "WARNING: Build cache hit rate below 50% (${cached}/${total}). Significant code changes invalidate the cached build — expect slower compilation."
+ fi
</code_context>
<issue_to_address>
**issue (bug_risk):** Harden arithmetic against empty or non-numeric totals/compiled counts
If `go list` or the build step fails (or their output format changes), `total`/`compiled` can be empty or non-numeric. Then `cached=$((total - compiled))` and the percentage calculation will throw a bash arithmetic error and abort. Please default missing values to 0 and ensure both vars are integers (e.g. `${total:-0}`, `${compiled:-0}` with a simple regex check), or skip the cache-reporting logic when the counts aren’t valid.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| invoke_go_build -o "$output" "${dirs[@]}" | ||
| local total | ||
| total=$(go list -deps "${dirs[@]}" 2>/dev/null | wc -l | tr -d ' ') | ||
| local compiled_count=/tmp/go-build-compiled-count |
There was a problem hiding this comment.
issue (bug_risk): Use a per-process or per-invocation temp file instead of a fixed path for compiled_count
A fixed /tmp path means concurrent runs of this script (or multiple go_build calls in one job) will race on the same compiled_count file, causing corrupted or incorrect counts. Please use a unique temp file per invocation (e.g., via mktemp or including $$/$BASHPID and job name in the filename) and clean it up afterward.
| compiled=$(cat "$compiled_count" 2>/dev/null | tr -d ' ') | ||
| local cached=$((total - compiled)) | ||
| echo >&2 "Build cache: ${cached}/${total} deps cached ($compiled compiled)" | ||
| if [[ "$total" -gt 0 ]] && [[ $((cached * 100 / total)) -lt 50 ]]; then |
There was a problem hiding this comment.
issue (bug_risk): Harden arithmetic against empty or non-numeric totals/compiled counts
If go list or the build step fails (or their output format changes), total/compiled can be empty or non-numeric. Then cached=$((total - compiled)) and the percentage calculation will throw a bash arithmetic error and abort. Please default missing values to 0 and ensure both vars are integers (e.g. ${total:-0}, ${compiled:-0} with a simple regex check), or skip the cache-reporting logic when the counts aren’t valid.
|
Images are ready for the commit at ba974ea. To use with deploy scripts, first |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19424 +/- ##
=======================================
Coverage 49.71% 49.71%
=======================================
Files 2701 2701
Lines 203453 203453
=======================================
+ Hits 101139 101140 +1
+ Misses 94786 94785 -1
Partials 7528 7528
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
actionlint flags matrix.gotags as undefined in jobs without a matrix strategy (go-bench, local-roxctl-tests, sensor-integration-tests). The key-suffix is only needed for the go and go-postgres jobs which have the gotags matrix to prevent GOCACHE key collisions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GOCACHE accumulates stale entries across commits as packages change: each commit creates new cache entries for recompiled packages, but old entries from previous commits are never cleaned up. Go's built-in trim only runs once per 24h with 5-day retention — too slow for CI. Fix: backdate all GOCACHE entries to year 2000 after restore. During the build, Go's markUsed() updates mtimes of accessed entries to "now". After the build, a trim step deletes entries still at year 2000 (stale). The cache auto-save post-step then saves only the trimmed directory. Key detail: trim.txt must be set to "now" after backdating, otherwise Go's built-in Trim() sees "last trim in year 2000" and deletes ALL backdated entries before the build even starts. Also bumps cache key from v1 to v2 for a clean slate. Locally verified: with ~50 commits of real source changes, trim removes 679MB (32%) of stale entries from a 2.1GB cache. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Enable Go build and test caching across commits by stabilizing cache ActionIDs. Reduces CI test job times by 3-9x and build compilation steps by 1.5-3.7x.
Measured results
Test jobs (unit-tests.yaml)
Seed run: 23069907040. Warm runs: 23071454130, 23073119966. Master comparison: 23063508964, 23054165953.
Test cache hit rate: 78% (1946/2509 packages). Results confirmed stable across two warm runs.
Step-level breakdown for key test jobs (showing where time is spent):
Build jobs (build.yaml)
Build run: 23073119981. Master comparison: 23063508995. Step-level timing for Go compilation steps only:
pre-build-go-binaries (pure Go compilation):
pre-build-cli (cross-compiles for multiple platforms):
Build cache stats: 4401/4403 deps cached (2 compiled) for go-binaries. Only
pkg/version/internalandpkg/versionrecompile per commit.Why caches miss today
Go's GOCACHE keys builds and tests by "ActionID" — a hash of compiler flags, source file contents, and dependency outputs. On master, every commit gets 0% cache hits for two reasons:
1. The commit SHA is baked into every ActionID. We pass
-X MainVersion=4.11.x-193-g<sha>as an ldflag, and Go includes ldflags in the link ActionID. Since the SHA changes on every commit, every link action gets a unique ID, causing a full cache miss for every binary. Go 1.18+ also silently enabled-buildvcs=autowhich stampsvcs.revisionandvcs.modifiedinto binaries — another per-commit change to ActionIDs. No code reads these VCS stamps.2. Git sets file mtimes to checkout time, and Go's test cache relies on mtime. Go's test result cache has two phases: first it checks the test binary identity (the ActionID from #1), then it validates that the source files haven't changed by checking
(path, size, mtime)— not content. Sincegit checkoutsets all file mtimes to "now", every CI run has different mtimes, so even if #1 were fixed, test results would still miss.Together, this means master rebuilds and re-runs every test on every commit, even when nothing relevant changed.
The fix
1.
chore: add VERSION file with base version tagA
VERSIONfile containing4.11.xprovides a stable base version tag.make/env.mkfalls back to it whengit describefails (e.g. shallow clones).start-release.ymlverifies VERSION matches the release and creates a PR to bump it on master for the next minor version.2.
perf(ci): replace -X ldflags with generated zversion.goInstead of passing version info via
-Xldflags (which pollute every binary's link ActionID),go-tool.shnow generates apkg/version/internal/zversion.gosource file that sets the version at init time. Because the version is now in a source file rather than in ldflags, it only affects the ActionID of the packages that contain it — not the link ActionID of every binary in the repo. Also adds-buildvcs=falseto stop the unused VCS stamping.For builds, zversion.go uses the full git-describe version so release binaries are correctly versioned. The changed file content causes
pkg/version/internalandpkg/versionto recompile, but the cascade stops there: Go's compile cache keys each package using the content hash of its dependency outputs (contentID), not their actionIDs. Whenpkg/versionrecompiles, its output is byte-identical (its source didn't change — only its dependency's init() body changed), so its contentID is unchanged. All ~4400 downstream packages see the same dependency contentID, compute the same actionID, and cache-hit. Verified locally: only 2 of 1559 packages recompile after an empty commit.For tests, the stable base tag is necessary because test caching works differently. Go's test cache checks the test binary identity, which includes the link ActionID. The linker hashes all input
.afiles by content — andpkg/version/internal.adid change. Since 1191 of 1952 packages (61%) transitively depend onpkg/version, using the full version for tests would invalidate 61% of test cache entries on every commit. Verified locally: tests miss cache after an empty commit with full version, but cache-hit with stable base tag.3.
perf(ci): enable Go test cache with fixed mtimeAfter checkout,
git ls-files -z | xargs -0 touch -t 200101010000sets all file mtimes to a fixed past date. This makes Go's test cache mtime validation pass across CI runs when file content hasn't changed. Also adds akey-suffixinput to the cache action so GOTAGS matrix entries (""vsrelease) don't overwrite each other's GOCACHE.How these pieces interact
The build cache (compilation + linking) is fixed by commits 1+2: removing
-Xldflags stabilizes link ActionIDs, and the zversion.go contentID cascade stops atpkg/versionso downstream compilation cache-hits.The test cache requires all three: commit 2 stabilizes the test binary ActionID (phase 1 — the link ActionID no longer changes per commit), and commit 3 stabilizes the file mtimes that Go checks to validate cached test results (phase 2). Both phases must hit for a cached test result to be reused.
See also: #19425 (GOCACHE stale entry trimming — independent, reduces cache size by ~27%).
Partially generated by AI.