fix: Harden informer cache with label selectors and memory optimizations#6242
fix: Harden informer cache with label selectors and memory optimizations#6242jyejare wants to merge 2 commits intofeast-dev:masterfrom
Conversation
eab7bf4 to
aa69c5b
Compare
aa69c5b to
6a2995e
Compare
344c7a0 to
b3237d2
Compare
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
b3237d2 to
e1b57ef
Compare
| return map[string]string{ | ||
| services.NameLabelKey: authz.Handler.FeatureStore.Name, | ||
| services.ServiceTypeLabelKey: string(services.AuthzFeastType), | ||
| services.ManagedByLabelKey: services.ManagedByLabelValue, |
There was a problem hiding this comment.
🟡 removeOrphanedRoles silently skips pre-upgrade custom auth Roles due to stricter label selector
The authz.getLabels() function now includes ManagedByLabelKey (authz.go:334), and removeOrphanedRoles uses this label set as a list selector (authz.go:85). Pre-upgrade custom auth Roles only have {NameLabelKey, ServiceTypeLabelKey} without ManagedByLabelKey, so the API server's label selector will never match them. These orphaned Roles will never be cleaned up by removeOrphanedRoles.
The main feast Role and RoleBinding are still cleaned up correctly via DeleteOwnedFeastObj (which looks up by name, not labels). Only custom auth roles from KubernetesAuthz.Roles are affected. The practical impact is limited: orphaned Roles have empty rules (no security impact) and have owner references for eventual GC on FeatureStore CR deletion. The window is narrow — it requires changing the Roles list concurrently with or very shortly after the operator upgrade, before the first reconciliation adds the label to existing Roles.
Prompt for agents
In authz.go, the removeOrphanedRoles function at line 81-101 lists Roles using authz.getLabels() as the label selector. Since getLabels() now includes ManagedByLabelKey, pre-upgrade Roles without this label are invisible to this cleanup function.
To fix: either (a) use a separate label set for removeOrphanedRoles that omits ManagedByLabelKey (matching by NameLabelKey and ServiceTypeLabelKey only), or (b) run a one-time migration during reconciliation that adds ManagedByLabelKey to all existing authz Roles before removeOrphanedRoles is called.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
The feast-operator's
Owns()calls create cluster-wide informers for ConfigMaps, Deployments, Services, and other resource types. On clusters with a large number of these objects, the informer cache can grow beyond the operator's 256Mi memory limit, causing OOMKill and restarts.Changes
ByObjectlabel selectors for all owned resource typesRestrict informer caches to only objects with
app.kubernetes.io/managed-by: feast-operator. Covers all 10 owned types: ConfigMap, Deployment, Service, ServiceAccount, PVC, RoleBinding, Role, CronJob, HPA, PDB. Extracted intonewCacheOptions()for clarity.DefaultTransform: cache.TransformStripManagedFields()Strip
managedFieldsfrom all cached objects, reducing per-object memory footprint by ~30-50%.GOMEMLIMIT=230MiBSet Go runtime soft memory limit (90% of 256Mi container limit). Triggers GC pressure before hard OOMKill as defense-in-depth.
Additional changes
app.kubernetes.io/managed-by: feast-operatorlabel togetLabels()so all FeatureStore-managed resources carry itgetSelectorLabels()for immutable selectors (Deploymentspec.selector, Servicespec.selector, TopologySpreadConstraints, PodAffinity) to avoid breaking existing resources on upgradeapp.kubernetes.io/managed-byservices.ManagedByLabelKey/Value) throughoutTest Results
Verified on cluster with a large number of ConfigMaps pre-loaded:
Test plan
make test) — all passgetSelectorLabels()prevents immutable selector breakage on upgradeSummary by CodeRabbit