Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 21, 2026

What this PR does / why we need it:

Feast manages ML features but lacks standardized lineage capture. This prevents integration with observability tools relying on OpenLineage for dataset dependencies and pipeline audit trails.

Adds OpenLineage event emission during feature materialization to capture job metadata, dataset inputs/outputs, and custom Feast facets.

Changes

Core Infrastructure

  • Added openlineage-python>=1.0.0,<2 dependency
  • Created OpenLineageConfig Pydantic model in repo_config.py with transport configuration (HTTP, Kafka, console, file)
  • Implemented OpenLineageClient wrapper with START/COMPLETE/FAIL event emission

Instrumentation

  • Modified feature_store.materialize() and materialize_incremental() to emit events when enabled
  • Events capture: job name, run ID, nominal time window, input datasets (offline store), output datasets (online store)
  • Custom Feast facets include: feature view name, features, entities, source type

Testing & Documentation

  • Unit tests for client and configuration (18 test cases)
  • Integration guide (docs/openlineage_integration.md) with transport examples
  • Example configuration (examples/feature_store_openlineage.yaml)

Example Usage

# feature_store.yaml
openlineage:
  enabled: true
  transport_type: http
  transport_config:
    url: http://marquez:5000
  namespace: feast
fs = FeatureStore(repo_path=".")
fs.materialize(start_date=..., end_date=...)
# Emits START event → materializes → emits COMPLETE/FAIL event

Properties

  • Disabled by default - zero impact on existing deployments
  • Standards-compliant - follows OpenLineage spec for job/run/dataset metadata
  • Transport-agnostic - supports HTTP, Kafka, console, file backends
  • No breaking changes - fully backward compatible

Misc

CodeQL scan: 0 vulnerabilities. All code review feedback addressed (config immutability, exception handling, helper method extraction).

Original prompt

This section details on the original issue you should resolve

<issue_title>Add support for OpenLineage</issue_title>
<issue_description>Is your feature request related to a problem? Please describe.
Feast currently manages ML features, their definitions, and their retrieval for training and serving models, but does not emit or capture standardized lineage events according to the OpenLineage spec. This means:

  • There is limited visibility into how Feast feature pipelines transform and move data across sources, feature views, and serving layers.
  • Teams using Feast cannot easily integrate with lineage and observability tools that rely on OpenLineage for tracing dataset dependencies, run metadata, and pipeline audit history.
  • Without lineage events, it’s difficult to tie feature generation to downstream model training and production inference in a standardized ecosystem of metadata tools.

OpenLineage is an open framework and specification for capturing lineage metadata about jobs, runs, and datasets — enabling consistent lineage collection and analysis across tools and pipelines. ([openlineage.io]1)

Describe the solution you'd like

💡 Describe the solution you’d like

Add first-class support for OpenLineage event emission within Feast to enable standardized lineage capture for feature generation, ingestion, and retrieval.

Key components of the solution

  1. OpenLineage client integration

    • Include an OpenLineage client library (Python) that can construct and emit lineage events for Feast pipelines.

    • Events should cover key Feast activities such as:

      • Feature view materialization runs (batch/stream)
      • Historical feature retrieval for training dataset generation
      • Online feature retrievals (as optional lineage events)
    • Construct core OpenLineage entities (job/run/dataset) using Feast concept metadata.

  2. Instrument Feast pipelines

    • Hook event emission into Feast orchestration workflows (e.g., feast materialize, SDK workflows).

    • Capture dataset inputs/outputs:

      • Offline store table/partition references used as inputs
      • Feature view outputs materialized in offline/online stores
  3. Custom OpenLineage facets (optional but recommended)

    • Add Feast-specific facets (e.g., feast_feature_view_facet) to enrich dataset and job metadata with feature definition details.
    • Follow OpenLineage custom facet naming and schema guidelines. ([openlineage.io]2)
  4. Backend support

    • Provide configuration for sending OpenLineage events to supported backends (HTTP, Kafka, Marquez, etc.).

Describe alternatives you've considered

  • Ad hoc lineage export scripts — Users can write custom tooling to snapshot pipelines and log lineage manually, but this is error prone and non-standard.
  • External orchestration lineage only — Rely on orchestrators like Airflow to produce lineage, but this fails to capture lineage inside Feast’s core feature logic and misses feature store semantics.
  • Out-of-band dumps — Export Feast registry/feature metadata offline, but does not capture temporal run lineage.

Additional context

  • OpenLineage spec provides a standard API for capturing lineage events (jobs/runs/datasets) and is extensible via facets. ([openlineage.io]2)

  • Capturing lineage benefits model governance, debugging, root-cause analysis, and data impact studies across ML systems.

  • Possible integration patterns:

    • Add an optional --openlineage flag to commands like feast materialize.
    • Emit lineage events in Feast SDK flows when users retrieve features programmatically.
    • Expose configuration options in feature_store.yaml for lineage backend endpoints.

</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits January 21, 2026 15:23
- Add openlineage-python dependency to setup.py
- Create OpenLineageConfig in repo_config.py for configuration
- Implement OpenLineageClient wrapper for event emission
- Instrument materialize() and materialize_incremental() methods
- Add unit tests for OpenLineage client

Co-authored-by: franciscojavierarceo <4163062+franciscojavierarceo@users.noreply.github.com>
- Add unit tests for OpenLineage configuration
- Create example feature_store.yaml with OpenLineage config
- Add comprehensive OpenLineage integration documentation

Co-authored-by: franciscojavierarceo <4163062+franciscojavierarceo@users.noreply.github.com>
- Fix config mutation by using local _enabled flag
- Extract helper methods for feature extraction and schema creation
- Use bare 'raise' to preserve exception context
- Add comment about import location to avoid overhead

Co-authored-by: franciscojavierarceo <4163062+franciscojavierarceo@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for OpenLineage events in Feast feat: Add OpenLineage integration for standardized lineage event emission Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for OpenLineage

2 participants