feat: scoped and global custom metric trackers#18302
feat: scoped and global custom metric trackers#18302parametalol wants to merge 4 commits intomasterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at e6ac4e1. To use with deploy scripts, first |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18302 +/- ##
==========================================
+ Coverage 48.92% 48.94% +0.01%
==========================================
Files 2619 2622 +3
Lines 197514 197934 +420
==========================================
+ Hits 96631 96869 +238
- Misses 93504 93677 +173
- Partials 7379 7388 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
||
| var ( | ||
| userRegistries map[string]*customRegistry = make(map[string]*customRegistry) | ||
| globalRegistry *customRegistry |
There was a problem hiding this comment.
Note to reviewers:
I was hesitating between a dedicated globalRegistry and just keeping a record in the userRegistries map under the globalScopeID identity key (empty string).
Claude suggested to keep it separate for clarity.
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new tests in
Test_scopecompare full metric outputs as raw strings, which can be brittle due to potential non-deterministic ordering from Prometheus; consider parsing the metrics or comparing via structured data (e.g., decoded metric families) instead of exact string equality. - In
trackerRunner.ServeHTTP, you construct a newpromhttp.HandlerForon every request; you could improve efficiency by creating a reusable handler (or at least a reusableprometheus.Gathererswrapper) and only varying the underlying registries, as those are already long-lived.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new tests in `Test_scope` compare full metric outputs as raw strings, which can be brittle due to potential non-deterministic ordering from Prometheus; consider parsing the metrics or comparing via structured data (e.g., decoded metric families) instead of exact string equality.
- In `trackerRunner.ServeHTTP`, you construct a new `promhttp.HandlerFor` on every request; you could improve efficiency by creating a reusable handler (or at least a reusable `prometheus.Gatherers` wrapper) and only varying the underlying registries, as those are already long-lived.
## Individual Comments
### Comment 1
<location> `central/metrics/custom/tracker/tracker_base.go:115` </location>
<code_context>
+func makeTrackerBase[F Finding](metricPrefix, description string, scoped bool,
+ getters LazyLabelGetters[F], generator FindingGenerator[F],
+) *TrackerBase[F] {
+ registryFactory := globalRegistryFactory
+ if scoped {
+ registryFactory = metrics.GetCustomRegistry
</code_context>
<issue_to_address>
**issue (complexity):** Consider replacing the scoped flag and magic global ID with an explicit scope ID provider strategy injected via constructors to clarify tracker behavior and simplify Gather.
You can keep the new functionality while making the semantics more explicit and removing the boolean + magic ID coupling by turning “scope” into an explicit strategy on the tracker.
### 1. Replace `scoped` + `globalScopeID` with an explicit scope ID provider
Add a function field that defines how to get the scope ID, instead of storing a boolean and using `""` as a magic value:
```go
type TrackerBase[F Finding] struct {
metricPrefix string
description string
getters LazyLabelGetters[F]
generator FindingGenerator[F]
config *Configuration
metricsConfigMux sync.RWMutex
gatherers sync.Map
cleanupWG sync.WaitGroup
registryFactory func(userID string) (metrics.CustomRegistry, error)
scopeIDProvider func(ctx context.Context) (string, bool) // <— new
}
```
Then `Gather` becomes simpler and self-describing:
```go
func (tracker *TrackerBase[Finding]) Gather(ctx context.Context) {
id, ok := tracker.scopeIDProvider(ctx)
if !ok {
return
}
cfg := tracker.getConfiguration()
if cfg == nil {
return
}
gatherer := tracker.getGatherer(id, cfg)
if gatherer == nil {
return
}
defer tracker.cleanupInactiveGatherers()
defer gatherer.running.Store(false)
if cfg.period == 0 || time.Since(gatherer.lastGather) < cfg.period {
return
}
// ...
}
```
This removes `scoped` and the need for `globalScopeID` in `Gather`, and makes the “auth required or not” decision explicit via the `scopeIDProvider`.
### 2. Pass behavior directly from constructors instead of via `scoped` flag
You can still share the common constructor logic, but without a boolean that needs to be re-interpreted later:
```go
func MakeTrackerBase[F Finding](metricPrefix, description string,
getters LazyLabelGetters[F], generator FindingGenerator[F],
) *TrackerBase[F] {
return makeTrackerBase(
metricPrefix,
description,
metrics.GetCustomRegistry,
func(ctx context.Context) (string, bool) {
userID, err := authn.IdentityFromContext(ctx)
if err != nil {
return "", false
}
return userID.UID(), true
},
getters,
generator,
)
}
func MakeGlobalTrackerBase[F Finding](metricPrefix, description string,
getters LazyLabelGetters[F], generator FindingGenerator[F],
) *TrackerBase[F] {
return makeTrackerBase(
metricPrefix,
description,
globalRegistryFactory,
func(ctx context.Context) (string, bool) {
return "global", true // or keep "" if you prefer, but it's now encapsulated
},
getters,
generator,
)
}
func makeTrackerBase[F Finding](
metricPrefix, description string,
registryFactory func(string) (metrics.CustomRegistry, error),
scopeIDProvider func(context.Context) (string, bool),
getters LazyLabelGetters[F],
generator FindingGenerator[F],
) *TrackerBase[F] {
return &TrackerBase[F]{
metricPrefix: metricPrefix,
description: description,
getters: getters,
generator: generator,
registryFactory: registryFactory,
scopeIDProvider: scopeIDProvider,
}
}
```
This keeps:
- Both public constructors (`MakeTrackerBase` and `MakeGlobalTrackerBase`)
- The global vs. scoped registry behavior
- The global vs. user-scoped `Gather` behavior
But removes:
- The `scoped` field
- The `globalScopeID` sentinel from control flow
- The boolean → factory selection → later reinterpretation chain
The resulting code makes the variant behavior explicit and localized to construction, while `Gather` just uses the injected strategies.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Comparing strings is more visual, and I didn't see any variations in Prometheus output in the tests.
A cache could indeed be introduced to cache handlers per user ID. But that looks like an overkill. |
58a6dae to
e6ac4e1
Compare
| return makeTrackerBase(metricPrefix, description, true, getters, generator) | ||
| } | ||
|
|
||
| // MakeGlobalTrackerBase creates a global, i.e. non-scoped tracker. |
There was a problem hiding this comment.
Could you please add here a comment that explains the reason that global trackers exist (more or less first sentence from the PR description)?
Description
Not all custom metrics need to be scoped (create a separate registry for each scrape user ID). This PR adds support for global trackers and makes a couple of existing trackers as such.
User-facing documentation
Testing and quality
Automated testing
How I validated my change
CI. Manual.
Current dependencies on/for this PR: