perf(ci): shard go and go-postgres unit test jobs#19744
perf(ci): shard go and go-postgres unit test jobs#19744
Conversation
Fix tests and scripts to work on ubuntu-latest host runners (no container) in addition to CI container environments. - rate_limited_logger.go: add /home/runner/work/ path prefix (container used /__w/) - download_zip_test.go: accept ErrPermission for /root/ paths (non-root runner returns permission denied, not not-found) - detection_test.go: delete circular test - lib.sh: escape & in bash 5.2+ replacements (verified 4.4-5.3) - helpers.bash: add yq_multidoc helper (yq 4.x adds --- separators) - roxctl-netpol-generate-*.bats: use yq_multidoc All changes are backward-compatible with container environments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SKIP_DEPS guard to deps target: when set, skip go mod tidy for jobs that don't need it (e.g. test-only jobs). - Split go-postgres-unit-tests into main/migrator subtargets to enable independent sharding in CI. - Remove -count=1 from integration-unit-tests to allow Go test cache hits. All changes are backward-compatible: existing callers work unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
grep -v returns exit code 1 when no lines match (e.g. yq output is only --- separators), which would look like a yq failure. sed always returns 0 regardless of whether lines were deleted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run Go test jobs directly on ubuntu-latest instead of in the CI container image. This saves ~55s container init overhead per job and enables faster cache I/O (no overlay filesystem). Changes per job: - Remove container: block - Add actions/setup-go with GOTOOLCHAIN=auto override - Replace su postgres with docker run postgres - Add pg_isready retry loop (docker postgres needs warmup time) - go-bench: skip on PRs (if: github.event_name == 'push') Depends on #19712 (host-runner test fixes). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split the monolithic go job into 5 shards (pkg-helm, pkg-other, central-1, central-2, rest) and go-postgres into 2 shards (main, migrator) for parallel execution. Also split integration and operator tests into separate jobs (go-integration, go-operator-integration) so they run in parallel with unit tests instead of sequentially after them. Depends on #19743 (container removal) and #19742 (SKIP_DEPS + postgres subtargets). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
EXCLUDE/exclude-patternregexes used to buildPACKAGES(e.g./central/[o-z],/helm/) rely on substring matches ingo listoutput and could easily over-/under-match if paths change; consider anchoring these patterns to path segments or using more explicit regexes to make the sharding more robust. - In the
go-integrationjob,go list ./... | grep -E 'registries|scanners|notifiers'will select any package whose import path contains those substrings (including potential helpers or unrelated dirs); if you intend to restrict to specific packages/directories, tighten this filter to explicit path patterns.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `EXCLUDE`/`exclude-pattern` regexes used to build `PACKAGES` (e.g. `/central/[o-z]`, `/helm/`) rely on substring matches in `go list` output and could easily over-/under-match if paths change; consider anchoring these patterns to path segments or using more explicit regexes to make the sharding more robust.
- In the `go-integration` job, `go list ./... | grep -E 'registries|scanners|notifiers'` will select any package whose import path contains those substrings (including potential helpers or unrelated dirs); if you intend to restrict to specific packages/directories, tighten this filter to explicit path patterns.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Images are ready for the commit at 9c2e078. To use with deploy scripts, first |
📝 WalkthroughSummary by CodeRabbit
WalkthroughRefactors CI test jobs to use job sharding and actions/setup-go, adds separate integration jobs, replaces container setups with Docker PostgreSQL + readiness polling, fixes JUnit XML escaping, removes a container-detection test, tightens zip traversal error checks, trims GitHub Actions workspace paths in logging, and adds a yq multidoc helper for tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/unit-tests.yaml:
- Around line 185-186: The "Cache Go dependencies" step using
./.github/actions/cache-go-dependencies is missing a key-suffix to separate
caches for different matrix runs; update that step to include a key-suffix like
${{ matrix.gotags }} (or ${{ matrix.gotags }}-<other-identifiers>) so the cache
key differs per gotags matrix entry and prevents collisions between matrix
variations.
- Around line 83-87: The unit-tests job is always setting GOTAGS to "test" and
ignores the matrix dimension; update the GOTAGS environment passed to
scripts/go-test.sh so it includes the matrix value (`${{ matrix.gotags }}`) as a
prefix before the existing test tag (so the matrix variations actually run with
different GOTAGS), i.e. change the GOTAGS assignment used when invoking
go-test.sh (the GOTAGS=... before scripts/go-test.sh) to incorporate `${{
matrix.gotags }}` while preserving the existing ",test" behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: dffb98ea-2637-41a2-8e05-52bc5f3ec7e3
📒 Files selected for processing (8)
.github/workflows/unit-tests.yamlpkg/containers/detection_test.gopkg/logging/rate_limited_logger.goroxctl/common/zipdownload/download_zip_test.goscripts/ci/lib.shtests/roxctl/bats-tests/helpers.bashtests/roxctl/bats-tests/local/roxctl-netpol-generate-development.batstests/roxctl/bats-tests/local/roxctl-netpol-generate-release.bats
💤 Files with no reviewable changes (1)
- pkg/containers/detection_test.go
| GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \ | ||
| scripts/go-test.sh -timeout 25m -race -cover \ | ||
| -coverprofile test-output/coverage.out -v \ | ||
| $PACKAGES \ | ||
| | tee test-output/test.log |
There was a problem hiding this comment.
Missing ${{ matrix.gotags }} causes both matrix variations to run identical tests.
The gotags matrix dimension is defined (line 24) but not used in this command. Both GOTAGS="" and GOTAGS=release matrix jobs will execute with identical GOTAGS=test, defeating the purpose of testing with release tags.
Compare with go-integration job (line 147) which correctly uses ${{ matrix.gotags }} prefix.
🐛 Proposed fix
# shellcheck disable=SC2086
- GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \
+ ${{ matrix.gotags }} GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \
scripts/go-test.sh -timeout 25m -race -cover \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/unit-tests.yaml around lines 83 - 87, The unit-tests job
is always setting GOTAGS to "test" and ignores the matrix dimension; update the
GOTAGS environment passed to scripts/go-test.sh so it includes the matrix value
(`${{ matrix.gotags }}`) as a prefix before the existing test tag (so the matrix
variations actually run with different GOTAGS), i.e. change the GOTAGS
assignment used when invoking go-test.sh (the GOTAGS=... before
scripts/go-test.sh) to incorporate `${{ matrix.gotags }}` while preserving the
existing ",test" behavior.
| - name: Cache Go dependencies | ||
| uses: ./.github/actions/cache-go-dependencies |
There was a problem hiding this comment.
Missing key-suffix for cache differentiation between gotags matrix runs.
Other jobs include key-suffix: ${{ matrix.gotags }}-... to prevent cache collisions between matrix variations. This job omits it, which could cause cache conflicts.
🔧 Proposed fix
- name: Cache Go dependencies
uses: ./.github/actions/cache-go-dependencies
+ with:
+ key-suffix: ${{ matrix.gotags }}-operator-integration📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Cache Go dependencies | |
| uses: ./.github/actions/cache-go-dependencies | |
| - name: Cache Go dependencies | |
| uses: ./.github/actions/cache-go-dependencies | |
| with: | |
| key-suffix: ${{ matrix.gotags }}-operator-integration |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/unit-tests.yaml around lines 185 - 186, The "Cache Go
dependencies" step using ./.github/actions/cache-go-dependencies is missing a
key-suffix to separate caches for different matrix runs; update that step to
include a key-suffix like ${{ matrix.gotags }} (or ${{ matrix.gotags
}}-<other-identifiers>) so the cache key differs per gotags matrix entry and
prevents collisions between matrix variations.
…o davdhacs/shard-go-tests
…est shard Use ./... with exclude-pattern /pkg/|/central/ instead of listing every top-level directory. Automatically picks up new directories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match the other test jobs' stable ldflag env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The operator Makefile needs the scanner module in GOMODCACHE for proto generation. Previously inherited from the go job's make deps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Operator tests use their own test runner, not scripts/go-test.sh, so test-output/test.log is never created. The generate-junit-reports make target fails when the log doesn't exist. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explain why each job overrides GOTOOLCHAIN from local to auto: setup-go sets GOTOOLCHAIN=local, but sub-module tools with newer go directives need auto to download the required toolchain. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explain that setup-go reads go.mod to install Go, then exports GOTOOLCHAIN=local to force that version. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify that setup-go exports GOTOOLCHAIN matching the go.mod version, not hardcoded to "local". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
07f23c0 to
b12da02
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The go-bench PR skip is handled separately in #19758. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #19744 +/- ##
===========================================
- Coverage 49.59% 49.02% -0.58%
===========================================
Files 2760 768 -1992
Lines 208144 74767 -133377
===========================================
- Hits 103239 36652 -66587
+ Misses 97239 34762 -62477
+ Partials 7666 3353 -4313
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Makefile (1)
572-578: Consider adding explicit timeout for consistency.The migrator target omits
-timeoutwhile the main target uses-timeout 15m. With-p 1forcing serial execution, total runtime could be extended. Other test targets in this Makefile use explicit timeouts. If migrator tests are expected to complete quickly, this is fine; otherwise consider adding a timeout for consistency.♻️ Optional: Add explicit timeout
.PHONY: go-postgres-unit-tests-migrator go-postgres-unit-tests-migrator: build-prep test-prep @# The -p 1 passed to go test is required to ensure that tests of different packages are not run in parallel, so as to avoid conflicts when interacting with the DB. set -o pipefail ; \ - CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 GOTAGS=$(GOTAGS),test,sql_integration scripts/go-test.sh -p 1 -race -cover -coverprofile test-output/migrator-coverage.out -v \ + CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 GOTAGS=$(GOTAGS),test,sql_integration scripts/go-test.sh -timeout 15m -p 1 -race -cover -coverprofile test-output/migrator-coverage.out -v \ $(shell git grep -rl "//go:build sql_integration" migrator | sed -e 's@^@./@g' | xargs -n 1 dirname | sort | uniq | xargs go list -tags sql_integration | grep -v '^github.com/stackrox/rox/tests$$' | grep -Ev $(UNIT_TEST_IGNORE)) \ | tee -a $(GO_TEST_OUTPUT_PATH)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 572 - 578, The go-postgres-unit-tests-migrator Makefile target (target name: go-postgres-unit-tests-migrator) calls scripts/go-test.sh without an explicit -timeout flag; update the invocation that builds the test command (the line invoking scripts/go-test.sh with -p 1 -race -cover ...) to include a timeout (e.g., -timeout 15m) consistent with the main test target so the test run cannot hang indefinitely and matches other targets' behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Makefile`:
- Around line 572-578: The go-postgres-unit-tests-migrator Makefile target
(target name: go-postgres-unit-tests-migrator) calls scripts/go-test.sh without
an explicit -timeout flag; update the invocation that builds the test command
(the line invoking scripts/go-test.sh with -p 1 -race -cover ...) to include a
timeout (e.g., -timeout 15m) consistent with the main test target so the test
run cannot hang indefinitely and matches other targets' behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 94e68179-5038-42cb-a425-673280fc91bb
📒 Files selected for processing (2)
.github/workflows/unit-tests.yamlMakefile
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/unit-tests.yaml
Description
Split Go test jobs into parallel shards and add supporting Makefile changes:
Makefile changes:
SKIP_DEPS: wrapdepstarget inifdef SKIP_DEPSguard to skipgo mod tidyin test-only CI jobsgo-postgres-unit-testsintogo-postgres-unit-tests-main/go-postgres-unit-tests-migratorsubtargets for independent sharding-count=1fromintegration-unit-teststo allow Go test cache hitsWorkflow changes (unit-tests.yaml):
go job → 5 shards:
pkg-helm,pkg-other,central-1,central-2,restgo-postgres → 2 shards:
main(central/pkg/tools),migrator(with-p 1)Split out from go job:
go-integration: registries/scanners/notifiers integration testsgo-operator-integration: operator test-integration + test-helmAll Makefile changes are backward-compatible.
Includes #19712 (host-runner test fixes).
Split from #19678, replaces #19742.
User-facing documentation
Testing and quality
Automated testing
How I validated my change
Previously validated on #19678.
🤖 Generated with Claude Code