Skip to content

ci: add background disk usage monitor to job-preamble#19397

Draft
davdhacs wants to merge 1 commit intomasterfrom
davdhacs/ci-disk-usage-monitor
Draft

ci: add background disk usage monitor to job-preamble#19397
davdhacs wants to merge 1 commit intomasterfrom
davdhacs/ci-disk-usage-monitor

Conversation

@davdhacs
Copy link
Contributor

@davdhacs davdhacs commented Mar 12, 2026

Description

Adds a background disk usage monitor to the job-preamble action. A subshell polls df every 30 seconds and appends available disk space to /dev/shm/disk-monitor.log (RAM-backed tmpfs, survives disk full). The existing record_job_info post-step dumps the log and kills the monitor.

Example output:

17:59:00  92GB
17:59:30  92GB
18:00:00  92GB
18:00:30  92GB

No nohup or gacts/run-and-post-run needed — regular composite action steps don't wait for background children.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Verified jobs complete without blocking (subshell backgrounding works)
  • Verified post-step outputs disk usage timeline in step logs
  • Verified monitor keeps logging under disk pressure (tested with fill-to-2GB workflow)
  • Verified shellcheck passes

Generated with Claude Code

@openshift-ci
Copy link

openshift-ci bot commented Mar 12, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The parsing of avail values assumes a strictly numeric prefix (sed -n 's/.*avail=\([0-9]*\).*/\1/p'), but df -BGB typically emits values with unit suffixes (e.g. 13G/13GB); consider normalizing by stripping all non-digits (e.g. tr -dc '0-9') once and reusing that to avoid brittle parsing.
  • When first_avail < min_avail (e.g. if disk is freed over time), peak_consumed=$((first_avail - min_avail)) becomes negative; it may be clearer to clamp this to zero or take the absolute/maximum difference to avoid confusing negative "consumption" values in the summary.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The parsing of `avail` values assumes a strictly numeric prefix (`sed -n 's/.*avail=\([0-9]*\).*/\1/p'`), but `df -BGB` typically emits values with unit suffixes (e.g. `13G`/`13GB`); consider normalizing by stripping all non-digits (e.g. `tr -dc '0-9'`) once and reusing that to avoid brittle parsing.
- When `first_avail < min_avail` (e.g. if disk is freed over time), `peak_consumed=$((first_avail - min_avail))` becomes negative; it may be clearer to clamp this to zero or take the absolute/maximum difference to avoid confusing negative "consumption" values in the summary.

## Individual Comments

### Comment 1
<location path=".github/actions/job-preamble/action.yaml" line_range="146-148" />
<code_context>
+      with:
+        shell: bash
+        run: |
+          LOGFILE="/tmp/disk-usage-monitor.log"
+          PIDFILE="/tmp/disk-usage-monitor.pid"
+
+          # Record initial snapshot
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Use per-job unique paths for LOGFILE/PIDFILE to avoid cross-job interference on shared runners

On self-hosted or reused runners, multiple jobs can share /tmp and overlap in time. Fixed LOGFILE/PIDFILE names can cause jobs to interfere with each other (e.g., killing another job’s monitor or overwriting its log). Please derive these paths from something unique like $GITHUB_RUN_ID, $GITHUB_JOB, and/or $$ (e.g., /tmp/disk-usage-monitor-${GITHUB_RUN_ID}-${GITHUB_JOB}.log).

```suggestion
        run: |
          # Use per-job unique paths to avoid cross-job interference on shared runners
          LOGFILE="/tmp/disk-usage-monitor-${GITHUB_RUN_ID:-unknown-run}-${GITHUB_JOB:-unknown-job}-$$.log"
          PIDFILE="/tmp/disk-usage-monitor-${GITHUB_RUN_ID:-unknown-run}-${GITHUB_JOB:-unknown-job}-$$.pid"
```
</issue_to_address>

### Comment 2
<location path=".github/actions/job-preamble/action.yaml" line_range="203-210" />
<code_context>
+          min_ts=""
+          first_avail=""
+          last_avail=""
+          for entry in "${entries[@]}"; do
+            avail_val=$(echo "$entry" | sed -n 's/.*avail=\([0-9]*\).*/\1/p')
+            entry_ts=$(echo "$entry" | cut -d' ' -f1)
+            if [[ -z "$first_avail" ]]; then
+              first_avail="$avail_val"
+            fi
+            last_avail="$avail_val"
+            if [[ "$avail_val" -lt "$min_avail" ]]; then
+              min_avail="$avail_val"
+              min_ts="$entry_ts"
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against empty or unparsable avail values before doing integer comparisons

If df or the log format ever produce a non-integer or missing avail value, sed will leave avail_val empty and [[ "$avail_val" -lt "$min_avail" ]] will raise a non-integer operand error, which can break this step. Consider validating avail_val with something like [[ "$avail_val" =~ ^[0-9]+$ ]] and skipping or handling entries that don’t match before assigning first_avail/last_avail or doing -lt comparisons.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@davdhacs davdhacs force-pushed the davdhacs/ci-disk-usage-monitor branch 2 times, most recently from 911a901 to 9f87317 Compare March 12, 2026 16:44
@rhacs-bot
Copy link
Contributor

rhacs-bot commented Mar 12, 2026

Images are ready for the commit at 71e7ef3.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-310-g71e7ef3071.

@davdhacs davdhacs force-pushed the davdhacs/ci-disk-usage-monitor branch 6 times, most recently from e42a308 to 8f13dab Compare March 12, 2026 17:54
Adds a background df poll (every 30s) that logs available disk space
to /dev/shm (RAM-backed tmpfs, survives disk full). The existing
record_job_info post-step kills the monitor and dumps the log.

Uses a plain subshell & in a regular step (no gacts/run-and-post-run
needed since regular composite action steps don't wait for background
children).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.68%. Comparing base (52688db) to head (71e7ef3).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19397   +/-   ##
=======================================
  Coverage   49.68%   49.68%           
=======================================
  Files        2700     2700           
  Lines      203278   203278           
=======================================
+ Hits       100999   101002    +3     
  Misses      94753    94753           
+ Partials     7526     7523    -3     
Flag Coverage Δ
go-unit-tests 49.68% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants