-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Is your feature request related to a problem? Please describe.
Feast currently manages ML features, their definitions, and their retrieval for training and serving models, but does not emit or capture standardized lineage events according to the OpenLineage spec. This means:
- There is limited visibility into how Feast feature pipelines transform and move data across sources, feature views, and serving layers.
- Teams using Feast cannot easily integrate with lineage and observability tools that rely on OpenLineage for tracing dataset dependencies, run metadata, and pipeline audit history.
- Without lineage events, it’s difficult to tie feature generation to downstream model training and production inference in a standardized ecosystem of metadata tools.
OpenLineage is an open framework and specification for capturing lineage metadata about jobs, runs, and datasets — enabling consistent lineage collection and analysis across tools and pipelines. ([openlineage.io]1)
Describe the solution you'd like
💡 Describe the solution you’d like
Add first-class support for OpenLineage event emission within Feast to enable standardized lineage capture for feature generation, ingestion, and retrieval.
Key components of the solution
-
OpenLineage client integration
-
Include an OpenLineage client library (Python) that can construct and emit lineage events for Feast pipelines.
-
Events should cover key Feast activities such as:
- Feature view materialization runs (batch/stream)
- Historical feature retrieval for training dataset generation
- Online feature retrievals (as optional lineage events)
-
Construct core OpenLineage entities (job/run/dataset) using Feast concept metadata.
-
-
Instrument Feast pipelines
-
Hook event emission into Feast orchestration workflows (e.g.,
feast materialize, SDK workflows). -
Capture dataset inputs/outputs:
- Offline store table/partition references used as inputs
- Feature view outputs materialized in offline/online stores
-
-
Custom OpenLineage facets (optional but recommended)
- Add Feast-specific facets (e.g.,
feast_feature_view_facet) to enrich dataset and job metadata with feature definition details. - Follow OpenLineage custom facet naming and schema guidelines. ([openlineage.io]2)
- Add Feast-specific facets (e.g.,
-
Backend support
- Provide configuration for sending OpenLineage events to supported backends (HTTP, Kafka, Marquez, etc.).
Describe alternatives you've considered
- Ad hoc lineage export scripts — Users can write custom tooling to snapshot pipelines and log lineage manually, but this is error prone and non-standard.
- External orchestration lineage only — Rely on orchestrators like Airflow to produce lineage, but this fails to capture lineage inside Feast’s core feature logic and misses feature store semantics.
- Out-of-band dumps — Export Feast registry/feature metadata offline, but does not capture temporal run lineage.
Additional context
-
OpenLineage spec provides a standard API for capturing lineage events (jobs/runs/datasets) and is extensible via facets. ([openlineage.io]2)
-
Capturing lineage benefits model governance, debugging, root-cause analysis, and data impact studies across ML systems.
-
Possible integration patterns:
- Add an optional
--openlineageflag to commands likefeast materialize. - Emit lineage events in Feast SDK flows when users retrieve features programmatically.
- Expose configuration options in
feature_store.yamlfor lineage backend endpoints.
- Add an optional