ROX-32873: Metrics for process arguments#18794
ROX-32873: Metrics for process arguments#18794JoukoVirtanen wants to merge 11 commits intomasterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at ccc26a8. To use with deploy scripts, first |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18794 +/- ##
==========================================
+ Coverage 49.64% 49.71% +0.07%
==========================================
Files 2698 2701 +3
Lines 203075 203471 +396
==========================================
+ Hits 100817 101160 +343
- Misses 94737 94784 +47
- Partials 7521 7527 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bc54504 to
d8e2a4e
Compare
|
/test gke-ui-e2e-tests |
erthalion
left a comment
There was a problem hiding this comment.
Thanks, few commentaries below.
| } | ||
|
|
||
| // recordProcessIndicatorAdded records metrics for a single process indicator added to DB. | ||
| func recordProcessIndicatorAdded(argsSize int) { |
There was a problem hiding this comment.
Does this have to be a separate function?
There was a problem hiding this comment.
No, it does not have to be a separate function. I have removed it.
| Buckets: []float64{0, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536}, | ||
| }) | ||
|
|
||
| processIndicatorsAddedCounter = prometheus.NewCounter(prometheus.CounterOpts{ |
There was a problem hiding this comment.
I think it makes sense to rename it into processIndicatorsLiveCounter, then increment it as in the PR and decrement it whenever a process indicator is removed (either in the pruner or for some other reasons).
There was a problem hiding this comment.
I considered that, but I don't think it is practical. When the process indicators are pruned, they are not returned. In most cases the IDs of the pruned process indicators are returned, but not the process indicators themselves. To decrement the counter we would need to know the process arguments that have been pruned. It could theoretically be done, but it would degrade performance to get the process arguments, and it would add to the maintenance burden. I don't think it would be worth it.
There was a problem hiding this comment.
What does this counter have to do with arguments? There are two metrics, one is a histogram of argument sizes, another one is number of added process indicators. The latter one could be added and decremented, since we're adding and removing process indicators. Correct?
There was a problem hiding this comment.
I misunderstood. I should have read more closely. I thought that you wanted the histogram to represent the distribution of process arguments in the database. Decrementing the count of process indicators makes sense.
There was a problem hiding this comment.
There is already a count of the number of rows in the process_indicators table so I think processIndicatorsLiveCounter would be redundant.
There was a problem hiding this comment.
Not quite, process_indicators is a metric taken form the database statistics and is only an estimated number of records. It may drift from the real number in under certain circumstances.
Another important point is that currently there are not easily consumable metrics regarding process pruning, and adding those is the main point of improving runtime data metrics. Since you're busy with that anyway, let's cover it as well.
There was a problem hiding this comment.
I have a PR to keep track of pruned process indicators here #19130
It is still in draft form.
3000c06 to
73aec63
Compare
|
/test gke-ui-e2e-tests |
73aec63 to
49ebcfa
Compare
|
/test gke-qa-e2e-tests |
49ebcfa to
066620c
Compare
…nts stored, and number of process indicators inserted
142c53e to
3ce4c0a
Compare
…roken down by cluster and namespace
Description
We would like to know how much memory is taken up by process arguments. The reason we would like to know this is that it would better help us plan how to handle process arguments. It would help us better understand the pros and cons of truncating process baselines in the future.
See the related PR for process lineage info #19406
User-facing documentation
Testing and quality
Automated testing
How I validated my change
Deployed ACS using
deploy/deploy-local.sh.Created a port forward