Skip to content

perf(ci): shard go and go-postgres unit test jobs#19744

Draft
davdhacs wants to merge 18 commits intomasterfrom
davdhacs/shard-go-tests
Draft

perf(ci): shard go and go-postgres unit test jobs#19744
davdhacs wants to merge 18 commits intomasterfrom
davdhacs/shard-go-tests

Conversation

@davdhacs
Copy link
Copy Markdown
Contributor

@davdhacs davdhacs commented Apr 1, 2026

Description

Split Go test jobs into parallel shards and add supporting Makefile changes:

Makefile changes:

  • SKIP_DEPS: wrap deps target in ifdef SKIP_DEPS guard to skip go mod tidy in test-only CI jobs
  • Split go-postgres-unit-tests into go-postgres-unit-tests-main / go-postgres-unit-tests-migrator subtargets for independent sharding
  • Remove -count=1 from integration-unit-tests to allow Go test cache hits

Workflow changes (unit-tests.yaml):

go job → 5 shards:

  • pkg-helm, pkg-other, central-1, central-2, rest

go-postgres → 2 shards:

  • main (central/pkg/tools), migrator (with -p 1)

Split out from go job:

  • go-integration: registries/scanners/notifiers integration tests
  • go-operator-integration: operator test-integration + test-helm

All Makefile changes are backward-compatible.

Includes #19712 (host-runner test fixes).
Split from #19678, replaces #19742.

User-facing documentation

Testing and quality

  • the change is production ready
  • CI results are inspected

Automated testing

  • modified existing tests

How I validated my change

Previously validated on #19678.

🤖 Generated with Claude Code

davdhacs and others added 7 commits March 31, 2026 10:06
Fix tests and scripts to work on ubuntu-latest host runners (no
container) in addition to CI container environments.

- rate_limited_logger.go: add /home/runner/work/ path prefix
  (container used /__w/)
- download_zip_test.go: accept ErrPermission for /root/ paths
  (non-root runner returns permission denied, not not-found)
- detection_test.go: delete circular test
- lib.sh: escape & in bash 5.2+ replacements (verified 4.4-5.3)
- helpers.bash: add yq_multidoc helper (yq 4.x adds --- separators)
- roxctl-netpol-generate-*.bats: use yq_multidoc

All changes are backward-compatible with container environments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add SKIP_DEPS guard to deps target: when set, skip go mod tidy
  for jobs that don't need it (e.g. test-only jobs).
- Split go-postgres-unit-tests into main/migrator subtargets to
  enable independent sharding in CI.
- Remove -count=1 from integration-unit-tests to allow Go test
  cache hits.

All changes are backward-compatible: existing callers work unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
grep -v returns exit code 1 when no lines match (e.g. yq output is
only --- separators), which would look like a yq failure. sed always
returns 0 regardless of whether lines were deleted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run Go test jobs directly on ubuntu-latest instead of in the CI
container image. This saves ~55s container init overhead per job
and enables faster cache I/O (no overlay filesystem).

Changes per job:
- Remove container: block
- Add actions/setup-go with GOTOOLCHAIN=auto override
- Replace su postgres with docker run postgres
- Add pg_isready retry loop (docker postgres needs warmup time)
- go-bench: skip on PRs (if: github.event_name == 'push')

Depends on #19712 (host-runner test fixes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split the monolithic go job into 5 shards (pkg-helm, pkg-other,
central-1, central-2, rest) and go-postgres into 2 shards
(main, migrator) for parallel execution.

Also split integration and operator tests into separate jobs
(go-integration, go-operator-integration) so they run in parallel
with unit tests instead of sequentially after them.

Depends on #19743 (container removal) and #19742 (SKIP_DEPS +
postgres subtargets).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The EXCLUDE/exclude-pattern regexes used to build PACKAGES (e.g. /central/[o-z], /helm/) rely on substring matches in go list output and could easily over-/under-match if paths change; consider anchoring these patterns to path segments or using more explicit regexes to make the sharding more robust.
  • In the go-integration job, go list ./... | grep -E 'registries|scanners|notifiers' will select any package whose import path contains those substrings (including potential helpers or unrelated dirs); if you intend to restrict to specific packages/directories, tighten this filter to explicit path patterns.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `EXCLUDE`/`exclude-pattern` regexes used to build `PACKAGES` (e.g. `/central/[o-z]`, `/helm/`) rely on substring matches in `go list` output and could easily over-/under-match if paths change; consider anchoring these patterns to path segments or using more explicit regexes to make the sharding more robust.
- In the `go-integration` job, `go list ./... | grep -E 'registries|scanners|notifiers'` will select any package whose import path contains those substrings (including potential helpers or unrelated dirs); if you intend to restrict to specific packages/directories, tighten this filter to explicit path patterns.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Apr 1, 2026

Images are ready for the commit at 9c2e078.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-544-g9c2e07883f.

@davdhacs davdhacs changed the base branch from davdhacs/remove-container-go-tests to master April 1, 2026 19:58
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Tests

    • Enhanced test infrastructure with job sharding and parallelization for improved CI performance.
    • Improved handling of multi-document YAML test assertions.
  • Bug Fixes

    • Fixed XML special character escaping in test output reports.
    • Enhanced file path normalization in application logging across environments.
  • Chores

    • Updated CI/CD workflow configurations and dependency management.

Walkthrough

Refactors CI test jobs to use job sharding and actions/setup-go, adds separate integration jobs, replaces container setups with Docker PostgreSQL + readiness polling, fixes JUnit XML escaping, removes a container-detection test, tightens zip traversal error checks, trims GitHub Actions workspace paths in logging, and adds a yq multidoc helper for tests.

Changes

Cohort / File(s) Summary
CI Workflow Restructuring
.github/workflows/unit-tests.yaml
Added job sharding for go; replaced container-based steps with actions/setup-go@v6 and GOTOOLCHAIN=auto; split integration/unit targets into go-integration and go-operator-integration; moved Jira reporting to go job; adjusted caches and removed codecov step; extended slack failure needs.
Postgres & Bench Jobs
.github/workflows/unit-tests.yaml
go-postgres and go-bench now start Postgres via Docker image (docker.io/library/postgres:${{ matrix.pg }}) and poll pg_isready; shard make targets and include shard in cache keys; removed container-based setups.
Makefile Targets
Makefile
Added SKIP_DEPS conditional to deps target (skip go.sum prerequisites when set); split go-postgres-unit-tests into -main and -migrator targets with separate coverage outputs; removed -count=1 from integration-unit-tests invocation.
CI Scripts
scripts/ci/lib.sh
Fixed JUnit XML escaping by correctly producing &amp;, &quot;, &#39;, &lt;, &gt; via escaped replacements in bash parameter expansions.
Test Utilities
tests/roxctl/bats-tests/helpers.bash
Added yq_multidoc() to run yq and strip YAML document separator lines (---) to stabilize multi-document assertions.
Bats Test Updates
tests/roxctl/bats-tests/local/roxctl-netpol-generate-development.bats, tests/roxctl/bats-tests/local/roxctl-netpol-generate-release.bats
Replaced yq invocations with yq_multidoc in multi-document assertions (kinds, labels, port fields); removed redundant commented-out alternate blocks.
Unit Test Removal
pkg/containers/detection_test.go
Removed TestContainerDetection (test that queried GITHUB_ACTIONS and asserted IsRunningInContainer() accordingly).
Zip Extraction Test
roxctl/common/zipdownload/download_zip_test.go
Broadened assertion for malicious-file absence to accept fs.ErrNotExist or fs.ErrPermission via errors.Is, to handle permission-vs-not-found differences across environments.
Logging Path Trimming
pkg/logging/rate_limited_logger.go
Extended getTrimmedFilePath to also trim /home/runner/work/stackrox/stackrox/ GitHub Actions workspace prefix when normalizing file paths used for rate-limited logging.
Other Test Orchestration
scripts/..., tests/...
Minor CI/test orchestration adjustments to capture shard-specific coverage and logs under test-output/, and to run scripts/go-test.sh with shard-based package selection and exclusions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main objective: sharding Go and go-postgres unit test jobs for performance improvement in CI.
Description check ✅ Passed The description covers Makefile changes, workflow changes, sharding strategy, and references related issues. However, the 'How I validated my change' section relies on prior validation rather than addressing the current state.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch davdhacs/shard-go-tests
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch davdhacs/shard-go-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/unit-tests.yaml:
- Around line 185-186: The "Cache Go dependencies" step using
./.github/actions/cache-go-dependencies is missing a key-suffix to separate
caches for different matrix runs; update that step to include a key-suffix like
${{ matrix.gotags }} (or ${{ matrix.gotags }}-<other-identifiers>) so the cache
key differs per gotags matrix entry and prevents collisions between matrix
variations.
- Around line 83-87: The unit-tests job is always setting GOTAGS to "test" and
ignores the matrix dimension; update the GOTAGS environment passed to
scripts/go-test.sh so it includes the matrix value (`${{ matrix.gotags }}`) as a
prefix before the existing test tag (so the matrix variations actually run with
different GOTAGS), i.e. change the GOTAGS assignment used when invoking
go-test.sh (the GOTAGS=... before scripts/go-test.sh) to incorporate `${{
matrix.gotags }}` while preserving the existing ",test" behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: dffb98ea-2637-41a2-8e05-52bc5f3ec7e3

📥 Commits

Reviewing files that changed from the base of the PR and between f8de23e and aaaff23.

📒 Files selected for processing (8)
  • .github/workflows/unit-tests.yaml
  • pkg/containers/detection_test.go
  • pkg/logging/rate_limited_logger.go
  • roxctl/common/zipdownload/download_zip_test.go
  • scripts/ci/lib.sh
  • tests/roxctl/bats-tests/helpers.bash
  • tests/roxctl/bats-tests/local/roxctl-netpol-generate-development.bats
  • tests/roxctl/bats-tests/local/roxctl-netpol-generate-release.bats
💤 Files with no reviewable changes (1)
  • pkg/containers/detection_test.go

Comment on lines +83 to +87
GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \
scripts/go-test.sh -timeout 25m -race -cover \
-coverprofile test-output/coverage.out -v \
$PACKAGES \
| tee test-output/test.log
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing ${{ matrix.gotags }} causes both matrix variations to run identical tests.

The gotags matrix dimension is defined (line 24) but not used in this command. Both GOTAGS="" and GOTAGS=release matrix jobs will execute with identical GOTAGS=test, defeating the purpose of testing with release tags.

Compare with go-integration job (line 147) which correctly uses ${{ matrix.gotags }} prefix.

🐛 Proposed fix
     # shellcheck disable=SC2086
-    GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \
+    ${{ matrix.gotags }} GOTAGS="${GOTAGS:+$GOTAGS,}test" CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 \
       scripts/go-test.sh -timeout 25m -race -cover \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/unit-tests.yaml around lines 83 - 87, The unit-tests job
is always setting GOTAGS to "test" and ignores the matrix dimension; update the
GOTAGS environment passed to scripts/go-test.sh so it includes the matrix value
(`${{ matrix.gotags }}`) as a prefix before the existing test tag (so the matrix
variations actually run with different GOTAGS), i.e. change the GOTAGS
assignment used when invoking go-test.sh (the GOTAGS=... before
scripts/go-test.sh) to incorporate `${{ matrix.gotags }}` while preserving the
existing ",test" behavior.

Comment on lines +185 to +186
- name: Cache Go dependencies
uses: ./.github/actions/cache-go-dependencies
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing key-suffix for cache differentiation between gotags matrix runs.

Other jobs include key-suffix: ${{ matrix.gotags }}-... to prevent cache collisions between matrix variations. This job omits it, which could cause cache conflicts.

🔧 Proposed fix
     - name: Cache Go dependencies
       uses: ./.github/actions/cache-go-dependencies
+      with:
+        key-suffix: ${{ matrix.gotags }}-operator-integration
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Cache Go dependencies
uses: ./.github/actions/cache-go-dependencies
- name: Cache Go dependencies
uses: ./.github/actions/cache-go-dependencies
with:
key-suffix: ${{ matrix.gotags }}-operator-integration
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/unit-tests.yaml around lines 185 - 186, The "Cache Go
dependencies" step using ./.github/actions/cache-go-dependencies is missing a
key-suffix to separate caches for different matrix runs; update that step to
include a key-suffix like ${{ matrix.gotags }} (or ${{ matrix.gotags
}}-<other-identifiers>) so the cache key differs per gotags matrix entry and
prevents collisions between matrix variations.

@davdhacs davdhacs changed the title perf(ci): shard go and go-postgres unit test jobs perf(ci): shard go/go-postgres jobs, add SKIP_DEPS and postgres subtargets Apr 1, 2026
davdhacs and others added 2 commits April 1, 2026 14:37
…est shard

Use ./... with exclude-pattern /pkg/|/central/ instead of listing
every top-level directory. Automatically picks up new directories.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match the other test jobs' stable ldflag env vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs changed the title perf(ci): shard go/go-postgres jobs, add SKIP_DEPS and postgres subtargets perf(ci): shard go and go-postgres unit test jobs Apr 1, 2026
davdhacs and others added 4 commits April 1, 2026 15:05
The operator Makefile needs the scanner module in GOMODCACHE for
proto generation. Previously inherited from the go job's make deps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Operator tests use their own test runner, not scripts/go-test.sh,
so test-output/test.log is never created. The generate-junit-reports
make target fails when the log doesn't exist.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explain why each job overrides GOTOOLCHAIN from local to auto:
setup-go sets GOTOOLCHAIN=local, but sub-module tools with newer
go directives need auto to download the required toolchain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Explain that setup-go reads go.mod to install Go, then exports
GOTOOLCHAIN=local to force that version.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarify that setup-go exports GOTOOLCHAIN matching the go.mod
version, not hardcoded to "local".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs davdhacs force-pushed the davdhacs/shard-go-tests branch from 07f23c0 to b12da02 Compare April 1, 2026 21:43
davdhacs and others added 2 commits April 1, 2026 15:47
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The go-bench PR skip is handled separately in #19758.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.02%. Comparing base (f8de23e) to head (9c2e078).
⚠️ Report is 101 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #19744       +/-   ##
===========================================
- Coverage   49.59%   49.02%    -0.58%     
===========================================
  Files        2760      768     -1992     
  Lines      208144    74767   -133377     
===========================================
- Hits       103239    36652    -66587     
+ Misses      97239    34762    -62477     
+ Partials     7666     3353     -4313     
Flag Coverage Δ
go-unit-tests 49.02% <ø> (-0.58%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 12, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Makefile (1)

572-578: Consider adding explicit timeout for consistency.

The migrator target omits -timeout while the main target uses -timeout 15m. With -p 1 forcing serial execution, total runtime could be extended. Other test targets in this Makefile use explicit timeouts. If migrator tests are expected to complete quickly, this is fine; otherwise consider adding a timeout for consistency.

♻️ Optional: Add explicit timeout
 .PHONY: go-postgres-unit-tests-migrator
 go-postgres-unit-tests-migrator: build-prep test-prep
 	@# The -p 1 passed to go test is required to ensure that tests of different packages are not run in parallel, so as to avoid conflicts when interacting with the DB.
 	set -o pipefail ; \
-	CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 GOTAGS=$(GOTAGS),test,sql_integration scripts/go-test.sh -p 1 -race -cover -coverprofile test-output/migrator-coverage.out -v \
+	CGO_ENABLED=1 GOEXPERIMENT=cgocheck2 MUTEX_WATCHDOG_TIMEOUT_SECS=30 GOTAGS=$(GOTAGS),test,sql_integration scripts/go-test.sh -timeout 15m -p 1 -race -cover -coverprofile test-output/migrator-coverage.out -v \
 		$(shell git grep -rl "//go:build sql_integration" migrator | sed -e 's@^@./@g' | xargs -n 1 dirname | sort | uniq | xargs go list -tags sql_integration | grep -v '^github.com/stackrox/rox/tests$$' | grep -Ev $(UNIT_TEST_IGNORE)) \
 		| tee -a $(GO_TEST_OUTPUT_PATH)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 572 - 578, The go-postgres-unit-tests-migrator
Makefile target (target name: go-postgres-unit-tests-migrator) calls
scripts/go-test.sh without an explicit -timeout flag; update the invocation that
builds the test command (the line invoking scripts/go-test.sh with -p 1 -race
-cover ...) to include a timeout (e.g., -timeout 15m) consistent with the main
test target so the test run cannot hang indefinitely and matches other targets'
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Makefile`:
- Around line 572-578: The go-postgres-unit-tests-migrator Makefile target
(target name: go-postgres-unit-tests-migrator) calls scripts/go-test.sh without
an explicit -timeout flag; update the invocation that builds the test command
(the line invoking scripts/go-test.sh with -p 1 -race -cover ...) to include a
timeout (e.g., -timeout 15m) consistent with the main test target so the test
run cannot hang indefinitely and matches other targets' behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 94e68179-5038-42cb-a425-673280fc91bb

📥 Commits

Reviewing files that changed from the base of the PR and between aaaff23 and 9c2e078.

📒 Files selected for processing (2)
  • .github/workflows/unit-tests.yaml
  • Makefile
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/unit-tests.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants