Skip to content

feat: scoped and global custom metric trackers#18302

Open
parametalol wants to merge 4 commits intomasterfrom
michael/scoped-and-global-trackers
Open

feat: scoped and global custom metric trackers#18302
parametalol wants to merge 4 commits intomasterfrom
michael/scoped-and-global-trackers

Conversation

@parametalol
Copy link
Contributor

@parametalol parametalol commented Dec 18, 2025

Description

Not all custom metrics need to be scoped (create a separate registry for each scrape user ID). This PR adds support for global trackers and makes a couple of existing trackers as such.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

CI. Manual.

Current dependencies on/for this PR:

@openshift-ci
Copy link

openshift-ci bot commented Dec 18, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Dec 18, 2025

Images are ready for the commit at e6ac4e1.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-18-ge6ac4e168d.

@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 84.44444% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.94%. Comparing base (d20c8fc) to head (58a6dae).
⚠️ Report is 104 commits behind head on master.

Files with missing lines Patch % Lines
central/metrics/custom/runner.go 73.33% 3 Missing and 1 partial ⚠️
central/metrics/custom/tracker/tracker_base.go 85.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18302      +/-   ##
==========================================
+ Coverage   48.92%   48.94%   +0.01%     
==========================================
  Files        2619     2622       +3     
  Lines      197514   197934     +420     
==========================================
+ Hits        96631    96869     +238     
- Misses      93504    93677     +173     
- Partials     7379     7388       +9     
Flag Coverage Δ
go-unit-tests 48.94% <84.44%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


var (
userRegistries map[string]*customRegistry = make(map[string]*customRegistry)
globalRegistry *customRegistry
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers:

I was hesitating between a dedicated globalRegistry and just keeping a record in the userRegistries map under the globalScopeID identity key (empty string).

Claude suggested to keep it separate for clarity.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new tests in Test_scope compare full metric outputs as raw strings, which can be brittle due to potential non-deterministic ordering from Prometheus; consider parsing the metrics or comparing via structured data (e.g., decoded metric families) instead of exact string equality.
  • In trackerRunner.ServeHTTP, you construct a new promhttp.HandlerFor on every request; you could improve efficiency by creating a reusable handler (or at least a reusable prometheus.Gatherers wrapper) and only varying the underlying registries, as those are already long-lived.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new tests in `Test_scope` compare full metric outputs as raw strings, which can be brittle due to potential non-deterministic ordering from Prometheus; consider parsing the metrics or comparing via structured data (e.g., decoded metric families) instead of exact string equality.
- In `trackerRunner.ServeHTTP`, you construct a new `promhttp.HandlerFor` on every request; you could improve efficiency by creating a reusable handler (or at least a reusable `prometheus.Gatherers` wrapper) and only varying the underlying registries, as those are already long-lived.

## Individual Comments

### Comment 1
<location> `central/metrics/custom/tracker/tracker_base.go:115` </location>
<code_context>
+func makeTrackerBase[F Finding](metricPrefix, description string, scoped bool,
+	getters LazyLabelGetters[F], generator FindingGenerator[F],
+) *TrackerBase[F] {
+	registryFactory := globalRegistryFactory
+	if scoped {
+		registryFactory = metrics.GetCustomRegistry
</code_context>

<issue_to_address>
**issue (complexity):** Consider replacing the scoped flag and magic global ID with an explicit scope ID provider strategy injected via constructors to clarify tracker behavior and simplify Gather.

You can keep the new functionality while making the semantics more explicit and removing the boolean + magic ID coupling by turning “scope” into an explicit strategy on the tracker.

### 1. Replace `scoped` + `globalScopeID` with an explicit scope ID provider

Add a function field that defines how to get the scope ID, instead of storing a boolean and using `""` as a magic value:

```go
type TrackerBase[F Finding] struct {
    metricPrefix string
    description  string
    getters      LazyLabelGetters[F]
    generator    FindingGenerator[F]

    config           *Configuration
    metricsConfigMux sync.RWMutex

    gatherers sync.Map
    cleanupWG sync.WaitGroup

    registryFactory func(userID string) (metrics.CustomRegistry, error)
    scopeIDProvider func(ctx context.Context) (string, bool) // <— new
}
```

Then `Gather` becomes simpler and self-describing:

```go
func (tracker *TrackerBase[Finding]) Gather(ctx context.Context) {
    id, ok := tracker.scopeIDProvider(ctx)
    if !ok {
        return
    }

    cfg := tracker.getConfiguration()
    if cfg == nil {
        return
    }

    gatherer := tracker.getGatherer(id, cfg)
    if gatherer == nil {
        return
    }
    defer tracker.cleanupInactiveGatherers()
    defer gatherer.running.Store(false)

    if cfg.period == 0 || time.Since(gatherer.lastGather) < cfg.period {
        return
    }

    // ...
}
```

This removes `scoped` and the need for `globalScopeID` in `Gather`, and makes the “auth required or not” decision explicit via the `scopeIDProvider`.

### 2. Pass behavior directly from constructors instead of via `scoped` flag

You can still share the common constructor logic, but without a boolean that needs to be re-interpreted later:

```go
func MakeTrackerBase[F Finding](metricPrefix, description string,
    getters LazyLabelGetters[F], generator FindingGenerator[F],
) *TrackerBase[F] {
    return makeTrackerBase(
        metricPrefix,
        description,
        metrics.GetCustomRegistry,
        func(ctx context.Context) (string, bool) {
            userID, err := authn.IdentityFromContext(ctx)
            if err != nil {
                return "", false
            }
            return userID.UID(), true
        },
        getters,
        generator,
    )
}

func MakeGlobalTrackerBase[F Finding](metricPrefix, description string,
    getters LazyLabelGetters[F], generator FindingGenerator[F],
) *TrackerBase[F] {
    return makeTrackerBase(
        metricPrefix,
        description,
        globalRegistryFactory,
        func(ctx context.Context) (string, bool) {
            return "global", true // or keep "" if you prefer, but it's now encapsulated
        },
        getters,
        generator,
    )
}

func makeTrackerBase[F Finding](
    metricPrefix, description string,
    registryFactory func(string) (metrics.CustomRegistry, error),
    scopeIDProvider func(context.Context) (string, bool),
    getters LazyLabelGetters[F],
    generator FindingGenerator[F],
) *TrackerBase[F] {
    return &TrackerBase[F]{
        metricPrefix:    metricPrefix,
        description:     description,
        getters:         getters,
        generator:       generator,
        registryFactory: registryFactory,
        scopeIDProvider: scopeIDProvider,
    }
}
```

This keeps:

- Both public constructors (`MakeTrackerBase` and `MakeGlobalTrackerBase`)
- The global vs. scoped registry behavior
- The global vs. user-scoped `Gather` behavior

But removes:

- The `scoped` field
- The `globalScopeID` sentinel from control flow
- The boolean → factory selection → later reinterpretation chain

The resulting code makes the variant behavior explicit and localized to construction, while `Gather` just uses the injected strategies.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@parametalol
Copy link
Contributor Author

The new tests in Test_scope compare full metric outputs as raw strings, which can be brittle due to potential non-deterministic ordering from Prometheus; consider parsing the metrics or comparing via structured data (e.g., decoded metric families) instead of exact string equality.

Comparing strings is more visual, and I didn't see any variations in Prometheus output in the tests.

In trackerRunner.ServeHTTP, you construct a new promhttp.HandlerFor on every request; you could improve efficiency by creating a reusable handler (or at least a reusable prometheus.Gatherers wrapper) and only varying the underlying registries, as those are already long-lived.

A cache could indeed be introduced to cache handlers per user ID. But that looks like an overkill.

@parametalol parametalol force-pushed the michael/scoped-and-global-trackers branch from 58a6dae to e6ac4e1 Compare February 2, 2026 16:47
return makeTrackerBase(metricPrefix, description, true, getters, generator)
}

// MakeGlobalTrackerBase creates a global, i.e. non-scoped tracker.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add here a comment that explains the reason that global trackers exist (more or less first sentence from the PR description)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants