Skip to content

perf(ci): build/test caching fixes#19424

Draft
davdhacs wants to merge 17 commits intomasterfrom
davdhacs/go-cache-optimization
Draft

perf(ci): build/test caching fixes#19424
davdhacs wants to merge 17 commits intomasterfrom
davdhacs/go-cache-optimization

Conversation

@davdhacs
Copy link
Contributor

@davdhacs davdhacs commented Mar 13, 2026

Summary

Replace -X ldflags with a generated zversion.go, and set files to a fixed mtime to allow test cache use.

Why:
On every commit in CI:

  1. Binaries are all re-linked: -X MainVersion=...g<sha> ldflags and buildvcs change (Go 1.18+ enabled stamps of vcs.revision into binaries by default), giving link and test binaries new GOCACHE ActionIDs.
  2. Tests never used cached results and re-run: git checkout sets file mtimes to "now", but Go's test cache validates by (path, size, mtime), not content.

Fix:

  1. Stable link ActionIDs:
    a. Move version info into a source file (zversion.go), removing -X ldflags.
    b. Disable buildvcs metadata, -buildvcs=false.
  2. Stable file mtime for repo files. After clone, set mtime on all files to a fixed date.

CI results

Test jobs

Warm runs: 23071454130, 23073119966. Master: 23063508964, 23054165953.

Job Master This PR Speedup
go (GOTAGS="") 38m 13.8m 2.7x
go (GOTAGS=release) 34m 13.5m 2.5x
go-postgres (GOTAGS="") 31m 3.8m 8.2x
go-postgres (GOTAGS=release) 30m 4.0m 7.5x
sensor-integration 22m 5.3m 4.2x

78% test cache hit rate (1946/2509 packages).

Build jobs (compilation step only)

Build: 23073119981. Master: 23063508995.

Job Master This PR Speedup
pre-build-go-binaries (amd64) 100s 27s 3.7x
pre-build-go-binaries (arm64) 68s 19s 3.6x
pre-build-cli 101s 60s 1.7x

4401/4403 deps cached, 2 compiled per commit.

See also: #19425 (GOCACHE stale entry trimming).

Partially generated by AI.

davdhacs and others added 3 commits March 13, 2026 13:42
Add a VERSION file containing the base version tag (4.11.x), following
the same pattern as COLLECTOR_VERSION, SCANNER_VERSION, FACT_VERSION.

- VERSION: contains "4.11.x"
- make/env.mk: make tag falls back to VERSION when git describe fails
- start-release.yml: verifies VERSION matches release, creates PR to
  update VERSION on master for the next minor release

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the //XDef: annotation + status.sh + -X ldflags mechanism with
a generated pkg/version/internal/zversion.go file.

For tests: uses the stable base tag from VERSION (e.g. "4.11.x"),
keeping Go link ActionIDs stable across commits. This enables both
the Go link cache (2x faster builds) and test result cache.

For builds: uses the full git-describe format (e.g. "4.11.x-320-gabcdef")
so release binaries have the correct version.

Also adds -buildvcs=false to remove unused VCS stamping that Go 1.18+
silently enabled (no code reads debug.ReadBuildInfo).

Verified locally:
- Same ldflags (link cache hit): 2.4s for 8 binaries
- Different ldflags (link cache miss): 5.2s (2.2x slower)

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go test cache has two phases:
1. Test binary ID (includes link ActionID with ldflags) — fixed by commit 2
2. File mtime validation — fixed here

git checkout sets all file mtimes to "now", causing Phase 2 to miss.
Fix: set all tracked file mtimes to a fixed date after checkout.

Also adds key-suffix input to cache-go-dependencies to disambiguate
GOTAGS matrix entries. github.job is the same for all matrix entries
(e.g. "go" for both GOTAGS="" and GOTAGS=release). Without key-suffix,
matrix entries overwrite each other's GOCACHE.

Combined measured results (PR #19404):
- go unit tests: 45m → 14m (3.3x)
- go-postgres: 35m → 4m (8.7x)
- sensor-integration: 21m → 5m (4x)
- Test cache hit rate: 92% (1942/2108)

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link

openshift-ci bot commented Mar 13, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The build cache statistics added to go_build rely on go build -v line counts and a fixed /tmp/go-build-compiled-count file, which can be both inaccurate (non-1:1 with compiled units) and racy if multiple builds run in parallel; consider using a unique temp file per invocation and/or gating this logging behind a flag or env var.
  • The git ls-files | xargs touch step in the composite action mutates mtimes for every tracked file, which could interfere with incremental build tooling or other languages in the same job; consider narrowing this to Go source/test files or to the directories actually used by Go.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The build cache statistics added to `go_build` rely on `go build -v` line counts and a fixed `/tmp/go-build-compiled-count` file, which can be both inaccurate (non-1:1 with compiled units) and racy if multiple builds run in parallel; consider using a unique temp file per invocation and/or gating this logging behind a flag or env var.
- The `git ls-files | xargs touch` step in the composite action mutates mtimes for every tracked file, which could interfere with incremental build tooling or other languages in the same job; consider narrowing this to Go source/test files or to the directories actually used by Go.

## Individual Comments

### Comment 1
<location path="scripts/go-tool.sh" line_range="108" />
<code_context>
-  invoke_go_build -o "$output" "${dirs[@]}"
+  local total
+  total=$(go list -deps "${dirs[@]}" 2>/dev/null | wc -l | tr -d ' ')
+  local compiled_count=/tmp/go-build-compiled-count
+  invoke_go_build -o "$output" "${dirs[@]}" 2> >(tee >(wc -l | tr -d ' ' > "$compiled_count") >&2)
+  wait
</code_context>
<issue_to_address>
**issue (bug_risk):** Use a per-process or per-invocation temp file instead of a fixed path for compiled_count

A fixed `/tmp` path means concurrent runs of this script (or multiple `go_build` calls in one job) will race on the same `compiled_count` file, causing corrupted or incorrect counts. Please use a unique temp file per invocation (e.g., via `mktemp` or including `$$`/`$BASHPID` and job name in the filename) and clean it up afterward.
</issue_to_address>

### Comment 2
<location path="scripts/go-tool.sh" line_range="112-115" />
<code_context>
+  invoke_go_build -o "$output" "${dirs[@]}" 2> >(tee >(wc -l | tr -d ' ' > "$compiled_count") >&2)
+  wait
+  local compiled
+  compiled=$(cat "$compiled_count" 2>/dev/null | tr -d ' ')
+  local cached=$((total - compiled))
+  echo >&2 "Build cache: ${cached}/${total} deps cached ($compiled compiled)"
+  if [[ "$total" -gt 0 ]] && [[ $((cached * 100 / total)) -lt 50 ]]; then
+    echo >&2 "WARNING: Build cache hit rate below 50% (${cached}/${total}). Significant code changes invalidate the cached build — expect slower compilation."
+  fi
</code_context>
<issue_to_address>
**issue (bug_risk):** Harden arithmetic against empty or non-numeric totals/compiled counts

If `go list` or the build step fails (or their output format changes), `total`/`compiled` can be empty or non-numeric. Then `cached=$((total - compiled))` and the percentage calculation will throw a bash arithmetic error and abort. Please default missing values to 0 and ensure both vars are integers (e.g. `${total:-0}`, `${compiled:-0}` with a simple regex check), or skip the cache-reporting logic when the counts aren’t valid.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Mar 13, 2026

Images are ready for the commit at def6343.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-337-gdef6343e26.

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.71%. Comparing base (ac93c38) to head (def6343).

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19424   +/-   ##
=======================================
  Coverage   49.71%   49.71%           
=======================================
  Files        2701     2701           
  Lines      203453   203453           
=======================================
+ Hits       101143   101145    +2     
+ Misses      94784    94782    -2     
  Partials     7526     7526           
Flag Coverage Δ
go-unit-tests 49.71% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@davdhacs davdhacs added the ci-save-cache Enable Go cache saves on this PR branch for testing label Mar 13, 2026
davdhacs and others added 3 commits March 13, 2026 14:47
actionlint flags matrix.gotags as undefined in jobs without a matrix
strategy (go-bench, local-roxctl-tests, sensor-integration-tests).
The key-suffix is only needed for the go and go-postgres jobs which
have the gotags matrix to prevent GOCACHE key collisions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs changed the title perf(ci): enable Go build and test caching across commits perf(ci): add VERSION file and caching fixes Mar 14, 2026
@davdhacs davdhacs changed the title perf(ci): add VERSION file and caching fixes perf(ci): add VERSION file and build/test caching fixes Mar 14, 2026
davdhacs and others added 7 commits March 13, 2026 23:02
GOCACHE accumulates stale entries across commits as packages change:
each commit creates new cache entries for recompiled packages, but old
entries from previous commits are never cleaned up. Go's built-in trim
only runs once per 24h with 5-day retention — too slow for CI.

Fix: backdate all GOCACHE entries to year 2000 after restore. During
the build, Go's markUsed() updates mtimes of accessed entries to "now".
After the build, a trim step deletes entries still at year 2000 (stale).
The cache auto-save post-step then saves only the trimmed directory.

Key detail: trim.txt must be set to "now" after backdating, otherwise
Go's built-in Trim() sees "last trim in year 2000" and deletes ALL
backdated entries before the build even starts.

Also bumps cache key from v1 to v2 for a clean slate.

Locally verified: with ~50 commits of real source changes, trim removes
679MB (32%) of stale entries from a 2.1GB cache.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go's compile cache uses SHA256 of file content, not mtime. Go's test
cache Phase 2 checks runtime-accessed files, not compiled sources.
The mtime of zversion.go is irrelevant for both caches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs changed the title perf(ci): add VERSION file and build/test caching fixes perf(ci): build/test caching fixes Mar 15, 2026
davdhacs and others added 3 commits March 15, 2026 00:08
Remove the VERSION file and use `git describe --tags --abbrev=0` directly
in go-tool.sh. This eliminates the second source of truth — git tags are
the single source. Reverts changes to make/env.mk, start-release.yml,
and helpers.bash that were only needed for the VERSION file fallback.

The VERSION file can be added in a follow-up if shallow clone support
is needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
If go list or the compiled count pipe fails, total/compiled could be
empty, causing bash arithmetic errors with set -e. Default to 0 and
validate both are numeric before computing cache hit rate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs removed the ci-save-cache Enable Go cache saves on this PR branch for testing label Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants