Skip to content

perf(ci): stabilize file mtimes for Go test cache#19395

Draft
davdhacs wants to merge 4 commits intomasterfrom
davdhacs/test-cache-mtime
Draft

perf(ci): stabilize file mtimes for Go test cache#19395
davdhacs wants to merge 4 commits intomasterfrom
davdhacs/test-cache-mtime

Conversation

@davdhacs
Copy link
Contributor

@davdhacs davdhacs commented Mar 12, 2026

Description

Go's test result cache has a two-phase lookup:

  1. Phase 1 — Test binary identity: Go computes a test ID from the test
    binary's build ActionID, which includes source file content hashes AND
    -X ldflags (exec.go:1404: linkflags %q). If the test binary ID
    changed since the last run, the cache misses and all tests re-run.

  2. Phase 2 — Test input validation: If the test binary ID matches, Go
    checks the files the test accessed at runtime (testdata, config files,
    etc.) by (path, size, mtime) — NOT content. If any mtime changed,
    the cache misses for that test.

Both phases must hit for test caching to work.

Currently on master, Phase 1 always misses because -X MainVersion and
-X GitShortSha change on every commit, creating a new test binary ID.
Go never reaches Phase 2 — tests re-run from scratch every time regardless
of file mtimes.

What each PR fixes

Test caching requires both PRs together. PR #19394 alone improves build
times but tests still miss at Phase 2 (mtimes differ between CI runs).
This PR alone never reaches Phase 2 (ldflags still change at Phase 1).
When both land, tests pass both phases and achieve 4-52x speedup.

The fix

Set all tracked file mtimes to a fixed date (2001-01-01 UTC) after checkout
and cache restore: git ls-files -z | xargs -0 touch -t 200101010000

Measured results

This PR alone (no test improvement — Phase 1 still misses)

Job Cold Master warm Our warm (mtime only)
go (GOTAGS="") 44.4m 35.7m 35.0m
go (GOTAGS=release) 45.1m 30.2m 31.6m
go-postgres (GOTAGS="") 35.8m 28.6m 30.7m
sensor-integration 30.0m 22.7m 23.0m
local-roxctl-tests 16.5m 8.9m 8.5m

Times are within run-to-run noise — the mtime fix alone has no effect on
tests because -X ldflags still invalidate the test binary ID at Phase 1.

Both PRs combined (PR #19201 data)

Job Master warm Combined warm Speedup
go (GOTAGS="") 35m ~8m 4.5x
go-postgres 31m ~36s 52x
sensor-integration 26m ~2.5m 11x

Test cache hit rate: 97.6% (681/698 packages cached).

Risk

If a non-.go testdata file changes content but not size between commits,
the test cache could return a stale result. This is Go's own inherent
limitation — it explicitly avoids hashing file content for test input
validation (test.go:hashOpen()).

Why not combine the PRs?

They are logically independent changes (ldflags vs mtimes) and are easier
to review separately. Either can merge first — whichever lands second
completes the pair and enables the full test cache speedup.

Partially generated by AI.

davdhacs and others added 2 commits March 12, 2026 07:53
Replace the //XDef: annotation + status.sh + -X ldflags mechanism with
a generated pkg/version/internal/zversion.go file. This stabilizes Go
build cache ActionIDs across commits because:

- Tests use only the base version tag (e.g. "4.11.x"), stable across commits
- Builds use the full git-describe format (e.g. "4.11.x-193-gabcdef")
- -buildvcs=false prevents Go from re-stamping VCS info

The XDef mechanism injected per-commit values via -X ldflags on every build,
which Go treated as different linker inputs, invalidating the entire GOCACHE.

Changes:
- scripts/go-tool.sh: generate_version_file() replaces XDef parsing
- pkg/version/internal/version_data.go: removed XDef annotations
- VERSION: new file with base version (4.11.x)
- .gitignore: added pkg/version/internal/zversion.go
- make/env.mk: TAG falls back to VERSION file when git describe fails
- operator/Dockerfile: USER 0 for writable zversion.go + copy VERSION files
- tests/roxctl/bats-tests/helpers.bash: VERSION file fallback

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go's test cache validates inputs by (path, size, mtime) — not content.
git checkout sets all file mtimes to the current time, which differs
between CI runs, causing test results to never be cached.

Fix: set all tracked file mtimes to a fixed date (2001-01-01) after
checkout and cache restore. This makes the test cache hit across runs
when file content hasn't changed.

Measured impact (from PR #19201):
- go unit tests: 35m → 8m (4.5x)
- go-postgres: 31m → 36s (52x)
- sensor-integration: 26m → 2.5m (11x)
- Test cache hit rate: 97.6% (681/698 packages)

Risk: if a non-.go testdata file changes content but not size between
commits, the test cache could return a stale result. This is the same
limitation Go's test cache has by design (it explicitly avoids hashing
file content for performance).

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link

openshift-ci bot commented Mar 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider restricting the git ls-files filter (e.g., to *.go and known testdata paths) to avoid unnecessarily touching large/binary/irrelevant tracked files and potentially interfering with tools that rely on mtimes for non-code assets.
  • It may be safer to make the fixed timestamp explicitly UTC-aware (e.g., TZ=UTC touch -t 200101010000 or touch -d '2001-01-01T00:00:00Z') to avoid any subtle differences across runners with different default timezones.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider restricting the `git ls-files` filter (e.g., to `*.go` and known testdata paths) to avoid unnecessarily touching large/binary/irrelevant tracked files and potentially interfering with tools that rely on mtimes for non-code assets.
- It may be safer to make the fixed timestamp explicitly UTC-aware (e.g., `TZ=UTC touch -t 200101010000` or `touch -d '2001-01-01T00:00:00Z'`) to avoid any subtle differences across runners with different default timezones.

## Individual Comments

### Comment 1
<location path=".github/actions/cache-go-dependencies/action.yaml" line_range="68" />
<code_context>
+        # Risk: if a non-.go testdata file changes content but not size between
+        # commits, the test cache could return a stale result. This is extremely
+        # rare and Go's own test cache has the same inherent limitation.
+        git ls-files -z | xargs -0 touch -t 200101010000
+      shell: bash
</code_context>
<issue_to_address>
**issue (bug_risk):** The fixed timestamp may still differ between runners due to local timezone interpretation.

`touch -t 200101010000` uses the runner’s local timezone. If different runners use different timezones (e.g., self‑hosted vs GitHub‑hosted), the resulting mtimes/epoch values will differ and the Go test cache may still miss across runners. To keep mtimes stable, force UTC (e.g., `TZ=UTC git ls-files -z | xargs -0 touch -t 200101010000`) or use a UTC `-d '2001-01-01T00:00:00Z'` style argument where available.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Mar 12, 2026

Images are ready for the commit at f13f6a2.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-313-gf13f6a2013.

@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.69%. Comparing base (52688db) to head (f13f6a2).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19395   +/-   ##
=======================================
  Coverage   49.68%   49.69%           
=======================================
  Files        2700     2700           
  Lines      203278   203278           
=======================================
+ Hits       100999   101009   +10     
+ Misses      94753    94745    -8     
+ Partials     7526     7524    -2     
Flag Coverage Δ
go-unit-tests 49.69% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@davdhacs davdhacs force-pushed the davdhacs/test-cache-mtime branch from 05f5ad5 to 21752e5 Compare March 12, 2026 16:53
@davdhacs davdhacs changed the base branch from davdhacs/zversion-codegen to master March 12, 2026 16:53
@janisz
Copy link
Contributor

janisz commented Mar 13, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants