Skip to content

ROX-32873: Metrics for process arguments#18794

Open
JoukoVirtanen wants to merge 11 commits intomasterfrom
jv-ROX-32873-metrics-for-process-arguments
Open

ROX-32873: Metrics for process arguments#18794
JoukoVirtanen wants to merge 11 commits intomasterfrom
jv-ROX-32873-metrics-for-process-arguments

Conversation

@JoukoVirtanen
Copy link
Contributor

@JoukoVirtanen JoukoVirtanen commented Feb 1, 2026

Description

We would like to know how much memory is taken up by process arguments. The reason we would like to know this is that it would better help us plan how to handle process arguments. It would help us better understand the pros and cons of truncating process baselines in the future.

See the related PR for process lineage info #19406

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Deployed ACS using deploy/deploy-local.sh.

Created a port forward

kubectl -n stackrox port-forward deploy/central 9090:9090 > /dev/null 2>&1 &
curl http://localhost:9090/metrics
# HELP rox_central_process_upserted_args_size Distribution of process argument sizes in characters for upserted indicators
# TYPE rox_central_process_upserted_args_size histogram
rox_central_process_upserted_args_size_bucket{le="0"} 29
rox_central_process_upserted_args_size_bucket{le="128"} 439
rox_central_process_upserted_args_size_bucket{le="256"} 483
rox_central_process_upserted_args_size_bucket{le="512"} 495
rox_central_process_upserted_args_size_bucket{le="1024"} 499
rox_central_process_upserted_args_size_bucket{le="2048"} 500
rox_central_process_upserted_args_size_bucket{le="4096"} 501
rox_central_process_upserted_args_size_bucket{le="8192"} 501
rox_central_process_upserted_args_size_bucket{le="16384"} 501
rox_central_process_upserted_args_size_bucket{le="32768"} 501
rox_central_process_upserted_args_size_bucket{le="65536"} 501
rox_central_process_upserted_args_size_bucket{le="+Inf"} 501
rox_central_process_upserted_args_size_sum 31352
rox_central_process_upserted_args_size_count 501
# HELP rox_central_process_upserted_args_size_total Total process argument sizes in characters by cluster and namespace
# TYPE rox_central_process_upserted_args_size_total counter
rox_central_process_upserted_args_size_total{cluster="05d9c020-2a1d-439a-8c9c-2cbdf5413673",namespace="default"} 8
rox_central_process_upserted_args_size_total{cluster="05d9c020-2a1d-439a-8c9c-2cbdf5413673",namespace="gke-managed-cim"} 1747
rox_central_process_upserted_args_size_total{cluster="05d9c020-2a1d-439a-8c9c-2cbdf5413673",namespace="gmp-system"} 6193
rox_central_process_upserted_args_size_total{cluster="05d9c020-2a1d-439a-8c9c-2cbdf5413673",namespace="kube-system"} 19811
rox_central_process_upserted_args_size_total{cluster="05d9c020-2a1d-439a-8c9c-2cbdf5413673",namespace="stackrox"} 3593

@openshift-ci
Copy link

openshift-ci bot commented Feb 1, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Feb 1, 2026

Images are ready for the commit at ccc26a8.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-310-gccc26a81f9.

@codecov
Copy link

codecov bot commented Feb 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.71%. Comparing base (82657e5) to head (ccc26a8).
⚠️ Report is 20 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18794      +/-   ##
==========================================
+ Coverage   49.64%   49.71%   +0.07%     
==========================================
  Files        2698     2701       +3     
  Lines      203075   203471     +396     
==========================================
+ Hits       100817   101160     +343     
- Misses      94737    94784      +47     
- Partials     7521     7527       +6     
Flag Coverage Δ
go-unit-tests 49.71% <100.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JoukoVirtanen JoukoVirtanen marked this pull request as ready for review February 4, 2026 23:03
@JoukoVirtanen JoukoVirtanen force-pushed the jv-ROX-32873-metrics-for-process-arguments branch from bc54504 to d8e2a4e Compare February 5, 2026 16:23
@JoukoVirtanen
Copy link
Contributor Author

/test gke-ui-e2e-tests

Copy link
Contributor

@erthalion erthalion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, few commentaries below.

}

// recordProcessIndicatorAdded records metrics for a single process indicator added to DB.
func recordProcessIndicatorAdded(argsSize int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be a separate function?

Copy link
Contributor Author

@JoukoVirtanen JoukoVirtanen Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not have to be a separate function. I have removed it.

Buckets: []float64{0, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536},
})

processIndicatorsAddedCounter = prometheus.NewCounter(prometheus.CounterOpts{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to rename it into processIndicatorsLiveCounter, then increment it as in the PR and decrement it whenever a process indicator is removed (either in the pruner or for some other reasons).

Copy link
Contributor Author

@JoukoVirtanen JoukoVirtanen Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but I don't think it is practical. When the process indicators are pruned, they are not returned. In most cases the IDs of the pruned process indicators are returned, but not the process indicators themselves. To decrement the counter we would need to know the process arguments that have been pruned. It could theoretically be done, but it would degrade performance to get the process arguments, and it would add to the maintenance burden. I don't think it would be worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this counter have to do with arguments? There are two metrics, one is a histogram of argument sizes, another one is number of added process indicators. The latter one could be added and decremented, since we're adding and removing process indicators. Correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood. I should have read more closely. I thought that you wanted the histogram to represent the distribution of process arguments in the database. Decrementing the count of process indicators makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a count of the number of rows in the process_indicators table so I think processIndicatorsLiveCounter would be redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, process_indicators is a metric taken form the database statistics and is only an estimated number of records. It may drift from the real number in under certain circumstances.

Another important point is that currently there are not easily consumable metrics regarding process pruning, and adding those is the main point of improving runtime data metrics. Since you're busy with that anyway, let's cover it as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR to keep track of pruned process indicators here #19130

It is still in draft form.

@JoukoVirtanen JoukoVirtanen force-pushed the jv-ROX-32873-metrics-for-process-arguments branch from 3000c06 to 73aec63 Compare February 12, 2026 03:54
@JoukoVirtanen
Copy link
Contributor Author

/test gke-ui-e2e-tests

@JoukoVirtanen JoukoVirtanen force-pushed the jv-ROX-32873-metrics-for-process-arguments branch from 73aec63 to 49ebcfa Compare February 23, 2026 02:04
@JoukoVirtanen
Copy link
Contributor Author

/test gke-qa-e2e-tests

@JoukoVirtanen JoukoVirtanen force-pushed the jv-ROX-32873-metrics-for-process-arguments branch from 49ebcfa to 066620c Compare March 2, 2026 23:49
@JoukoVirtanen JoukoVirtanen force-pushed the jv-ROX-32873-metrics-for-process-arguments branch from 142c53e to 3ce4c0a Compare March 11, 2026 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants