Skip to content

perf(ci): remove container from style.yaml jobs, run on host#19309

Draft
davdhacs wants to merge 7 commits intomasterfrom
worktree-style-no-container
Draft

perf(ci): remove container from style.yaml jobs, run on host#19309
davdhacs wants to merge 7 commits intomasterfrom
worktree-style-no-container

Conversation

@davdhacs
Copy link
Contributor

@davdhacs davdhacs commented Mar 5, 2026

Description

Remove the container: directive from all 4 container-based jobs in the Style workflow and run them directly on the ubuntu-latest runner.

Problem: Container initialization consistently takes ~55s per job (measured across recent master runs). The CI container image (apollo-ci:stackrox-test-0.5.3) is ~3-4 GB and provides tools that are mostly already preinstalled on GitHub-hosted runners (Go, Node.js, Make, GCC, Git, Helm, shellcheck, jq, Docker).

Approach: Replace container with targeted tool setup:

  • actions/setup-go and actions/setup-node for language runtimes (~1-3s each, tool-cache hit)
  • Shell-based installs for yq, xmlstarlet, Python linters — run as parallel background subshells in style-check to minimize wall-clock time
  • pip install for Python linters (pycodestyle, pylint) needed by openshift-ci-lint

Expected savings per job:

Job Container init (before) Tool setup (after) Net saving
check-generated-files ~57s ~5s ~52s
misc-checks ~54s ~5s ~49s
style-check ~45s ~15s ~30s
openshift-ci-lint ~55s ~5s ~50s

Also updates cache-ui-dependencies to use portable ~/ paths instead of container-specific /github/home/ paths, with a cache key bump (v2 -> v3) to avoid stale entries. The ~ paths resolve identically inside containers (/github/home), so other workflows using this action are unaffected.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • modified existing tests

How I validated my change

CI validation — this PR's own Style workflow run validates that all 4 migrated jobs pass without the container. Comparing step timings against recent master runs will confirm the expected savings.

Remove the container: directive from all 4 container-based jobs in the
Style workflow (check-generated-files, misc-checks, style-check,
openshift-ci-lint) and run them directly on the ubuntu-latest runner.

Container initialization consistently takes ~55s per job. Replacing it
with on-host tool setup (setup-go, setup-node, yq, xmlstarlet, pip)
costs ~5-15s, saving ~40-50s per job. Most tools (Go, Node.js, Make,
GCC, Git, Helm, shellcheck, jq) are already preinstalled on the runner.

For style-check, the shell-based tool installations (yq, xmlstarlet,
pip) run as parallel background subshells to minimize wall-clock time.

Also updates cache-ui-dependencies to use portable ~/. paths instead of
container-specific /github/home/ paths, with a cache key bump (v2->v3).

Generated with the assistance of AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link

openshift-ci bot commented Mar 5, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@davdhacs davdhacs marked this pull request as ready for review March 5, 2026 16:55
@davdhacs davdhacs requested a review from a team as a code owner March 5, 2026 16:55
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The yq installation logic is duplicated across multiple jobs; consider extracting this into a small composite action or reusable workflow step so version bumps or install changes are centralized.
  • The workflow uses a bare pip command on the hosted runner; switching to python3 -m pip (or explicitly setting up a Python version) would make the linter install more deterministic and robust across runner images.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The yq installation logic is duplicated across multiple jobs; consider extracting this into a small composite action or reusable workflow step so version bumps or install changes are centralized.
- The workflow uses a bare `pip` command on the hosted runner; switching to `python3 -m pip` (or explicitly setting up a Python version) would make the linter install more deterministic and robust across runner images.

## Individual Comments

### Comment 1
<location path=".github/workflows/style.yaml" line_range="44" />
<code_context>
+
+    - name: Install yq
+      run: |
+        sudo wget -qO /usr/local/bin/yq "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64"
+        sudo chmod +x /usr/local/bin/yq
+
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Consider verifying the downloaded `yq` binary to reduce supply-chain risk

Since this downloads and executes a remote binary, please add an integrity check (e.g., verify the SHA256 checksum against the official release) before marking it executable to mitigate supply-chain risk.
</issue_to_address>

### Comment 2
<location path=".github/workflows/style.yaml" line_range="322-323" />
<code_context>
         with:
           gcp-account: ${{ secrets.GCP_SERVICE_ACCOUNT_STACKROX_CI }}

+      - name: Install Python linters
+        run: pip install -r .openshift-ci/dev-requirements.txt
+
       - name: pycodestyle
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Use an explicit Python interpreter when installing linters for predictability

Calling `pip` directly is sensitive to PATH and default Python version changes on GitHub runners. Please invoke it via the interpreter you depend on, e.g. `python3 -m pip install -r .openshift-ci/dev-requirements.txt`, to make the setup more stable over time.

```suggestion
      - name: Install Python linters
        run: python3 -m pip install -r .openshift-ci/dev-requirements.txt
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

davdhacs and others added 2 commits March 5, 2026 10:44
Address sourcery-ai review feedback:
- Add SHA256 checksum verification after downloading yq binary to
  mitigate supply-chain risk
- Use `python3 -m pip` instead of bare `pip` for deterministic Python
  interpreter resolution across runner image updates

Generated with the assistance of AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
actions/setup-go sets GOTOOLCHAIN=local by default, which prevents Go
from auto-downloading a newer toolchain when a dependency (e.g. buf)
requires one. The CI container didn't have this restriction.

Set GOTOOLCHAIN=auto at the workflow level so Go tools that require a
newer patch version (e.g. buf needing go1.25.6 when go.mod says 1.25.0)
can download the required toolchain automatically.

Generated with the assistance of AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rhacs-bot
Copy link
Contributor

rhacs-bot commented Mar 5, 2026

Images are ready for the commit at 5bdb821.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-257-g5bdb821f29.

actions/setup-go writes GOTOOLCHAIN=local to GITHUB_ENV, which
overrides any workflow-level env setting. This prevents Go tools
in sub-modules with newer go.mod directives (e.g. tools/proto
requires go 1.25.6) from auto-downloading the required toolchain.

Write GOTOOLCHAIN=auto to GITHUB_ENV in a step after setup-go so
it takes effect in all subsequent steps. This matches the behavior
in the CI container where GOTOOLCHAIN was unset (defaulting to auto).

The project uses GOTOOLCHAIN=local only in targeted make targets
(go mod tidy) as a compatibility guard — not as a global setting.

Generated with the assistance of AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.64%. Comparing base (9a1667a) to head (5bdb821).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19309      +/-   ##
==========================================
+ Coverage   49.62%   49.64%   +0.01%     
==========================================
  Files        2680     2689       +9     
  Lines      202231   202505     +274     
==========================================
+ Hits       100362   100537     +175     
- Misses      94382    94472      +90     
- Partials     7487     7496       +9     
Flag Coverage Δ
go-unit-tests 49.64% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@openshift-ci
Copy link

openshift-ci bot commented Mar 5, 2026

@davdhacs: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-ui-e2e-tests aff6277 link true /test gke-ui-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

The operator's bundle_helpers virtualenv uses packages (pip, pytest)
that rely on pkgutil.ImpImporter, which was removed in Python 3.12.
The CI container had Python 3.9 (UBI8) where this worked. The
ubuntu-latest runner has Python 3.12.

Add actions/setup-python with python-version 3.9 to match the
container's Python version until operator/bundle_helpers is updated
for Python 3.12 compatibility.

Generated with the assistance of AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@janisz janisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea 🎉

run: |
# Override GOTOOLCHAIN=local set by setup-go.
echo "GOTOOLCHAIN=auto" >> "$GITHUB_ENV"
sudo wget -qO /usr/local/bin/yq "https://github.com/mikefarah/yq/releases/download/v4.44.6/yq_linux_amd64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a go package so maybe we should have in tools module to easily track the version and updates on the other hand we will need to compile it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like yq should be there
https://hollycummins.com/using-yq-in-github-actions/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!


failed=0
for pid in $(jobs -p); do
wait "$pid" || failed=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much time do we save on doing it in parallel. Is it worth complex code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I think it is only seconds so I'll remove it. I was prematurely optimizing this because I didn't like the waiting step by step sequentially


- name: Install yq
run: |
# Override GOTOOLCHAIN=local set by setup-go.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to override it. It's based on go.mod so it should be the right version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately it looked like setup-go exported GOTOOLCHAIN=local always. I'll test to see if it can be told/verified to use the go.mod value

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants