feat: Add feature quality monitoring with statistical metrics, REST API, and CLI#6202
feat: Add feature quality monitoring with statistical metrics, REST API, and CLI#6202jyejare wants to merge 1 commit intofeast-dev:masterfrom
Conversation
Signed-off-by: Jitendra Yejare <11752425+jyejare@users.noreply.github.com>
4340dbb to
940a4af
Compare
| try: | ||
| fv = server.store.registry.get_feature_view( | ||
| name=feature_view_name, project=project | ||
| ) | ||
| assert_permissions(fv, actions=[action]) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
🔴 Permission check silently swallows all exceptions, completely bypassing RBAC
_assert_fv_permission wraps the entire permission check in except Exception: pass, which catches and ignores FeastPermissionError raised by assert_permissions when the user is unauthorized. As confirmed in feast/permissions/enforcer.py, FeastPermissionError inherits from Exception (via feast/errors.py:568). This means every call to _assert_fv_permission is a no-op — unauthorized users can compute metrics (UPDATE action) and read monitoring data (DESCRIBE action) for any feature view without restriction.
| try: | |
| fv = server.store.registry.get_feature_view( | |
| name=feature_view_name, project=project | |
| ) | |
| assert_permissions(fv, actions=[action]) | |
| except Exception: | |
| pass | |
| try: | |
| fv = server.store.registry.get_feature_view( | |
| name=feature_view_name, project=project | |
| ) | |
| assert_permissions(fv, actions=[action]) | |
| except FeastObjectNotFoundException: | |
| pass |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if feature_service_name: | ||
| return self._get_metrics_by_service( | ||
| project, | ||
| feature_service_name, | ||
| lambda fv_name: self.monitoring_store.get_feature_metrics( | ||
| project=project, feature_view_name=fv_name, **kwargs | ||
| ), | ||
| ) | ||
| return self.monitoring_store.get_feature_metrics(project=project, **kwargs) |
There was a problem hiding this comment.
🔴 TypeError from duplicate feature_view_name kwarg when feature_service_name is provided
In MonitoringService.get_feature_metrics and get_feature_view_metrics, when feature_service_name is truthy, the lambda passes feature_view_name=fv_name explicitly AND also spreads **kwargs which already contains feature_view_name from the caller. For example, the REST endpoint at monitoring.py:124-132 calls svc.get_feature_metrics(project=..., feature_service_name=..., feature_view_name=..., ...). In the service method (monitoring_service.py:94-108), feature_view_name ends up in **kwargs, and then the lambda at line 104-106 passes both feature_view_name=fv_name and **kwargs (which contains feature_view_name), causing TypeError: got multiple values for keyword argument 'feature_view_name'. This crashes any request that provides feature_service_name.
| if feature_service_name: | |
| return self._get_metrics_by_service( | |
| project, | |
| feature_service_name, | |
| lambda fv_name: self.monitoring_store.get_feature_metrics( | |
| project=project, feature_view_name=fv_name, **kwargs | |
| ), | |
| ) | |
| return self.monitoring_store.get_feature_metrics(project=project, **kwargs) | |
| if feature_service_name: | |
| filtered_kwargs = {k: v for k, v in kwargs.items() if k != "feature_view_name"} | |
| return self._get_metrics_by_service( | |
| project, | |
| feature_service_name, | |
| lambda fv_name: self.monitoring_store.get_feature_metrics( | |
| project=project, feature_view_name=fv_name, **filtered_kwargs | |
| ), | |
| ) | |
| return self.monitoring_store.get_feature_metrics(project=project, **kwargs) |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if feature_service_name: | ||
| return self._get_metrics_by_service( | ||
| project, | ||
| feature_service_name, | ||
| lambda fv_name: self.monitoring_store.get_feature_view_metrics( | ||
| project=project, feature_view_name=fv_name, **kwargs | ||
| ), | ||
| ) | ||
| return self.monitoring_store.get_feature_view_metrics(project=project, **kwargs) |
There was a problem hiding this comment.
🔴 Same duplicate feature_view_name kwarg bug in get_feature_view_metrics
Identical to the issue in get_feature_metrics: when feature_service_name is truthy, the lambda at line 120-122 passes feature_view_name=fv_name explicitly while **kwargs also contains feature_view_name from the REST endpoint call at monitoring.py:149-156. This causes TypeError: got multiple values for keyword argument 'feature_view_name' for any request to /monitoring/metrics/feature_views that includes feature_service_name.
| if feature_service_name: | |
| return self._get_metrics_by_service( | |
| project, | |
| feature_service_name, | |
| lambda fv_name: self.monitoring_store.get_feature_view_metrics( | |
| project=project, feature_view_name=fv_name, **kwargs | |
| ), | |
| ) | |
| return self.monitoring_store.get_feature_view_metrics(project=project, **kwargs) | |
| if feature_service_name: | |
| filtered_kwargs = {k: v for k, v in kwargs.items() if k != "feature_view_name"} | |
| return self._get_metrics_by_service( | |
| project, | |
| feature_service_name, | |
| lambda fv_name: self.monitoring_store.get_feature_view_metrics( | |
| project=project, feature_view_name=fv_name, **filtered_kwargs | |
| ), | |
| ) | |
| return self.monitoring_store.get_feature_view_metrics(project=project, **kwargs) |
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
This PR introduces feature quality monitoring capabilities to Feast, enabling proactive tracking of feature distributions and data quality metrics. Currently, Feast has no built-in tools for monitoring feature health in production — ML teams must build custom solutions to detect issues like distribution shifts, elevated null rates, or degraded data quality before they silently impact model performance.
What it adds:
Monitoring storage layer (
MonitoringStore) — Three dedicated Postgres tables (feast_monitoring_feature_metrics,feast_monitoring_feature_view_metrics,feast_monitoring_feature_service_metrics) with UPSERT operations, baseline management, and filtered reads.PyArrow-based metrics computation (
MetricsCalculator) — Backend-agnostic statistical computation supporting:PrimitiveFeastTypeandValueTypeOrchestration service (
MonitoringService) — Ties registry, offline store, calculator, and storage together. Supports both batch source (viaOfflineStore.pull_all_from_table_or_query()) and feature log source (viaFeatureService.logging_configdestination). Computes and aggregates metrics at feature, feature view, and feature service levels.REST API (
/monitoring/) — Six endpoints registered in the registry REST server:POST /monitoring/compute— Trigger on-demand metrics computationGET /monitoring/metrics/features— Feature-level metrics with filteringGET /monitoring/metrics/feature_views— Feature view aggregatesGET /monitoring/metrics/feature_services— Feature service aggregatesGET /monitoring/metrics/baseline— Baseline distribution retrievalGET /monitoring/metrics/timeseries— Time-series data for trend analysisproject,feature_service_name,feature_view_name,feature_name,data_source_type, date rangeAuthzedAction.DESCRIBE(read) andAuthzedAction.UPDATE(compute)CLI command (
feast monitor run) — CLI entry point for cron/orchestrator integration with options for project, feature view, date range, data source type, and baseline flag.Design decisions:
/monitoring/route rather than extending existing/metrics/— The existing metrics route serves registry inventory metadata (resource counts, popular tags); monitoring serves statistical feature quality data with a different data path (offline store vs registry)feast apply— Gives users explicit control over what constitutes the reference distributionWhich issue(s) this PR fixes:
Partially Fixes #5919
Checks
git commit -s)Testing Strategy
Test coverage (42 tests, all passing):
test_metrics_calculator.pytest_monitoring_store.pytest_monitoring_integration.pyfeast monitor run), RBAC enforcementSnyk SAST scan: 0 vulnerabilities across all new files.