-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add dbt integration for importing models as FeatureViews #5827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
franciscojavierarceo
merged 22 commits into
feast-dev:master
from
YassinNouh21:feat/dbt-feast-integration-3335-clean
Jan 16, 2026
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
682e314
feat: Add dbt integration for importing models as FeatureViews (#3335)
YassinNouh21 d20962d
fix: Address mypy and ruff lint errors in dbt integration
YassinNouh21 354f921
fix: Address ruff lint errors in dbt unit tests
YassinNouh21 5ca317c
style: Format dbt files with ruff
YassinNouh21 a301f83
fix: Remove unused dbt-artifacts-parser import and fix enum import
YassinNouh21 460810c
feat: Use dbt-artifacts-parser for typed manifest parsing
YassinNouh21 b398368
fix: Add graceful fallback for dbt-artifacts-parser validation errors
YassinNouh21 2f777ff
fix: Skip dbt tests when dbt-artifacts-parser is not installed
YassinNouh21 86fb951
refactor: Simplify parser to rely solely on dbt-artifacts-parser
YassinNouh21 a17b50c
ci: Add dbt-artifacts-parser to unit test dependencies
YassinNouh21 55174a5
fix: Address Copilot code review comments for dbt integration
YassinNouh21 e4ba00a
fix: Only add ellipsis to truncated descriptions
YassinNouh21 01730a8
style: Format dbt files with ruff
YassinNouh21 8a06b83
fix: Convert doctest examples to code blocks to avoid CI failures
YassinNouh21 fb40e93
fix: Add dbt-artifacts-parser to feast[ci] and update requirements
YassinNouh21 53932ff
docs: Add dbt integration documentation
YassinNouh21 972fc96
docs: Add alpha warning to dbt integration documentation
YassinNouh21 b2901f4
fix: Add dbt-artifacts-parser to CI_REQUIRED dependencies
YassinNouh21 fe253c1
fix: Add defensive Array.base_type handling with logging
YassinNouh21 ed2c291
docs: Add comment explaining ImageBytes/PdfBytes exclusion
YassinNouh21 7a50c73
fix: Move imports to top of file to resolve linter errors
YassinNouh21 c4ad283
Merge branch 'master' into feat/dbt-feast-integration-3335-clean
franciscojavierarceo File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,381 @@ | ||
| # Importing Features from dbt | ||
|
|
||
| {% hint style="warning" %} | ||
| **Alpha Feature**: The dbt integration is currently in early development and subject to change. | ||
|
|
||
| **Current Limitations**: | ||
| - Supported data sources: BigQuery, Snowflake, and File-based sources only | ||
| - Single entity per model | ||
| - Manual entity column specification required | ||
|
|
||
| Breaking changes may occur in future releases. | ||
| {% endhint %} | ||
|
|
||
| This guide explains how to use Feast's dbt integration to automatically import dbt models as Feast FeatureViews. This enables you to leverage your existing dbt transformations as feature definitions without manual duplication. | ||
|
|
||
| ## Overview | ||
|
|
||
| [dbt (data build tool)](https://www.getdbt.com/) is a popular tool for transforming data in your warehouse. Many teams already use dbt to create feature tables. Feast's dbt integration allows you to: | ||
|
|
||
| - **Discover** dbt models tagged for feature engineering | ||
| - **Import** model metadata (columns, types, descriptions) as Feast objects | ||
| - **Generate** Python code for Entity, DataSource, and FeatureView definitions | ||
|
|
||
| This eliminates the need to manually define Feast objects that mirror your dbt models. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - A dbt project with compiled artifacts (`target/manifest.json`) | ||
| - Feast installed with dbt support: | ||
|
|
||
| ```bash | ||
| pip install 'feast[dbt]' | ||
| ``` | ||
|
|
||
| Or install the parser directly: | ||
|
|
||
| ```bash | ||
| pip install dbt-artifacts-parser | ||
| ``` | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### 1. Tag your dbt models | ||
|
|
||
| In your dbt project, add a `feast` tag to models you want to import: | ||
|
|
||
| {% code title="models/driver_features.sql" %} | ||
| ```sql | ||
| {{ config( | ||
| materialized='table', | ||
| tags=['feast'] | ||
| ) }} | ||
|
|
||
| SELECT | ||
| driver_id, | ||
| event_timestamp, | ||
| avg_rating, | ||
| total_trips, | ||
| is_active | ||
| FROM {{ ref('stg_drivers') }} | ||
| ``` | ||
| {% endcode %} | ||
|
|
||
| ### 2. Define column types in schema.yml | ||
|
|
||
| Feast uses column metadata from your `schema.yml` to determine feature types: | ||
|
|
||
| {% code title="models/schema.yml" %} | ||
| ```yaml | ||
| version: 2 | ||
| models: | ||
| - name: driver_features | ||
| description: "Driver aggregated features for ML models" | ||
| columns: | ||
| - name: driver_id | ||
| description: "Unique driver identifier" | ||
| data_type: STRING | ||
| - name: event_timestamp | ||
| description: "Feature timestamp" | ||
| data_type: TIMESTAMP | ||
| - name: avg_rating | ||
| description: "Average driver rating" | ||
| data_type: FLOAT64 | ||
| - name: total_trips | ||
| description: "Total completed trips" | ||
| data_type: INT64 | ||
| - name: is_active | ||
| description: "Whether driver is currently active" | ||
| data_type: BOOLEAN | ||
| ``` | ||
| {% endcode %} | ||
|
|
||
| ### 3. Compile your dbt project | ||
|
|
||
| ```bash | ||
| cd your_dbt_project | ||
| dbt compile | ||
| ``` | ||
|
|
||
| This generates `target/manifest.json` which Feast will read. | ||
|
|
||
| ### 4. List available models | ||
|
|
||
| Use the Feast CLI to discover tagged models: | ||
|
|
||
| ```bash | ||
| feast dbt list target/manifest.json --tag-filter feast | ||
| ``` | ||
|
|
||
| Output: | ||
| ``` | ||
| Found 1 model(s) with tag 'feast': | ||
|
|
||
| driver_features | ||
| Description: Driver aggregated features for ML models | ||
| Columns: driver_id, event_timestamp, avg_rating, total_trips, is_active | ||
| Tags: feast | ||
| ``` | ||
|
|
||
| ### 5. Import models as Feast definitions | ||
|
|
||
| Generate a Python file with Feast object definitions: | ||
|
|
||
| ```bash | ||
| feast dbt import target/manifest.json \ | ||
| --entity-column driver_id \ | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what happens when a user has multiple dbt models with multiple entities? ideally we can go from some sort of metadata tag to autogenerating the Entity. |
||
| --data-source-type bigquery \ | ||
| --tag-filter feast \ | ||
| --output features/driver_features.py | ||
| ``` | ||
|
|
||
| This generates: | ||
|
|
||
| {% code title="features/driver_features.py" %} | ||
| ```python | ||
| """ | ||
| Feast feature definitions generated from dbt models. | ||
|
|
||
| Source: target/manifest.json | ||
| Project: my_dbt_project | ||
| Generated by: feast dbt import | ||
| """ | ||
|
|
||
| from datetime import timedelta | ||
|
|
||
| from feast import Entity, FeatureView, Field | ||
| from feast.types import Bool, Float64, Int64 | ||
| from feast.infra.offline_stores.bigquery_source import BigQuerySource | ||
|
|
||
|
|
||
| # Entities | ||
| driver_id = Entity( | ||
| name="driver_id", | ||
| join_keys=["driver_id"], | ||
| description="Entity key for dbt models", | ||
| tags={'source': 'dbt'}, | ||
| ) | ||
|
|
||
|
|
||
| # Data Sources | ||
| driver_features_source = BigQuerySource( | ||
| name="driver_features_source", | ||
| table="my_project.my_dataset.driver_features", | ||
| timestamp_field="event_timestamp", | ||
| description="Driver aggregated features for ML models", | ||
| tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'}, | ||
| ) | ||
|
|
||
|
|
||
| # Feature Views | ||
| driver_features_fv = FeatureView( | ||
| name="driver_features", | ||
| entities=[driver_id], | ||
| ttl=timedelta(days=1), | ||
| schema=[ | ||
| Field(name="avg_rating", dtype=Float64, description="Average driver rating"), | ||
| Field(name="total_trips", dtype=Int64, description="Total completed trips"), | ||
| Field(name="is_active", dtype=Bool, description="Whether driver is currently active"), | ||
| ], | ||
| online=True, | ||
| source=driver_features_source, | ||
| description="Driver aggregated features for ML models", | ||
| tags={'dbt.model': 'driver_features', 'dbt.tag.feast': 'true'}, | ||
| ) | ||
| ``` | ||
| {% endcode %} | ||
|
|
||
| ## CLI Reference | ||
|
|
||
| ### `feast dbt list` | ||
|
|
||
| Discover dbt models available for import. | ||
|
|
||
| ```bash | ||
| feast dbt list <manifest_path> [OPTIONS] | ||
| ``` | ||
|
|
||
| **Arguments:** | ||
| - `manifest_path`: Path to dbt's `manifest.json` file | ||
|
|
||
| **Options:** | ||
| - `--tag-filter`, `-t`: Filter models by dbt tag (e.g., `feast`) | ||
| - `--model`, `-m`: Filter to specific model name(s) | ||
|
|
||
| ### `feast dbt import` | ||
|
|
||
| Import dbt models as Feast object definitions. | ||
|
|
||
| ```bash | ||
| feast dbt import <manifest_path> [OPTIONS] | ||
| ``` | ||
|
|
||
| **Arguments:** | ||
| - `manifest_path`: Path to dbt's `manifest.json` file | ||
|
|
||
| **Options:** | ||
|
|
||
| | Option | Description | Default | | ||
| |--------|-------------|---------| | ||
| | `--entity-column`, `-e` | Column to use as entity key | (required) | | ||
| | `--data-source-type`, `-d` | Data source type: `bigquery`, `snowflake`, `file` | `bigquery` | | ||
| | `--tag-filter`, `-t` | Filter models by dbt tag | None | | ||
| | `--model`, `-m` | Import specific model(s) only | None | | ||
| | `--timestamp-field` | Timestamp column name | `event_timestamp` | | ||
| | `--ttl-days` | Feature TTL in days | `1` | | ||
| | `--exclude-columns` | Columns to exclude from features | None | | ||
| | `--no-online` | Disable online serving | `False` | | ||
| | `--output`, `-o` | Output Python file path | None (stdout) | | ||
| | `--dry-run` | Preview without generating code | `False` | | ||
|
|
||
| ## Type Mapping | ||
|
|
||
| Feast automatically maps dbt/warehouse column types to Feast types: | ||
|
|
||
| | dbt/SQL Type | Feast Type | | ||
| |--------------|------------| | ||
| | `STRING`, `VARCHAR`, `TEXT` | `String` | | ||
| | `INT`, `INTEGER`, `BIGINT` | `Int64` | | ||
| | `SMALLINT`, `TINYINT` | `Int32` | | ||
| | `FLOAT`, `REAL` | `Float32` | | ||
| | `DOUBLE`, `FLOAT64` | `Float64` | | ||
| | `BOOLEAN`, `BOOL` | `Bool` | | ||
| | `TIMESTAMP`, `DATETIME` | `UnixTimestamp` | | ||
| | `BYTES`, `BINARY` | `Bytes` | | ||
| | `ARRAY<type>` | `Array(type)` | | ||
|
|
||
| Snowflake `NUMBER(precision, scale)` types are handled specially: | ||
| - Scale > 0: `Float64` | ||
| - Precision <= 9: `Int32` | ||
| - Precision <= 18: `Int64` | ||
| - Precision > 18: `Float64` | ||
|
|
||
| ## Data Source Configuration | ||
|
|
||
| ### BigQuery | ||
|
|
||
| ```bash | ||
| feast dbt import manifest.json -e user_id -d bigquery -o features.py | ||
| ``` | ||
|
|
||
| Generates `BigQuerySource` with the full table path from dbt metadata: | ||
| ```python | ||
| BigQuerySource( | ||
| table="project.dataset.table_name", | ||
| ... | ||
| ) | ||
| ``` | ||
|
|
||
| ### Snowflake | ||
|
|
||
| ```bash | ||
| feast dbt import manifest.json -e user_id -d snowflake -o features.py | ||
| ``` | ||
|
|
||
| Generates `SnowflakeSource` with database, schema, and table: | ||
| ```python | ||
| SnowflakeSource( | ||
| database="MY_DB", | ||
| schema="MY_SCHEMA", | ||
| table="TABLE_NAME", | ||
| ... | ||
| ) | ||
| ``` | ||
|
|
||
| ### File | ||
|
|
||
| ```bash | ||
| feast dbt import manifest.json -e user_id -d file -o features.py | ||
| ``` | ||
|
|
||
| Generates `FileSource` with a placeholder path: | ||
| ```python | ||
| FileSource( | ||
| path="/data/table_name.parquet", | ||
| ... | ||
| ) | ||
| ``` | ||
|
|
||
| {% hint style="info" %} | ||
| For file sources, update the generated path to point to your actual data files. | ||
| {% endhint %} | ||
|
|
||
| ## Best Practices | ||
|
|
||
| ### 1. Use consistent tagging | ||
|
|
||
| Create a standard tagging convention in your dbt project: | ||
|
|
||
| ```yaml | ||
| # dbt_project.yml | ||
| models: | ||
| my_project: | ||
| features: | ||
| +tags: ['feast'] # All models in features/ get the feast tag | ||
| ``` | ||
|
|
||
| ### 2. Document your columns | ||
|
|
||
| Column descriptions from `schema.yml` are preserved in the generated Feast definitions, making your feature catalog self-documenting. | ||
|
|
||
| ### 3. Review before committing | ||
|
|
||
| Use `--dry-run` to preview what will be generated: | ||
|
|
||
| ```bash | ||
| feast dbt import manifest.json -e user_id -d bigquery --dry-run | ||
| ``` | ||
|
|
||
| ### 4. Version control generated code | ||
|
|
||
| Commit the generated Python files to your repository. This allows you to: | ||
| - Track changes to feature definitions over time | ||
| - Review dbt-to-Feast mapping in pull requests | ||
| - Customize generated code if needed | ||
|
|
||
| ### 5. Integrate with CI/CD | ||
|
|
||
| Add dbt import to your CI pipeline: | ||
|
|
||
| ```yaml | ||
| # .github/workflows/features.yml | ||
| - name: Compile dbt | ||
| run: dbt compile | ||
|
|
||
| - name: Generate Feast definitions | ||
| run: | | ||
| feast dbt import target/manifest.json \ | ||
| -e user_id -d bigquery -t feast \ | ||
| -o feature_repo/features.py | ||
|
|
||
| - name: Apply Feast changes | ||
| run: feast apply | ||
| ``` | ||
|
|
||
| ## Limitations | ||
|
|
||
| - **Single entity support**: Currently supports one entity column per import. For multi-entity models, run multiple imports or manually adjust the generated code. | ||
| - **No incremental updates**: Each import generates a complete file. Use version control to track changes. | ||
| - **Column types required**: Models without `data_type` in schema.yml default to `String` type. | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### "manifest.json not found" | ||
|
|
||
| Run `dbt compile` or `dbt run` first to generate the manifest file. | ||
|
|
||
| ### "No models found with tag" | ||
|
|
||
| Check that your models have the correct tag in their config: | ||
|
|
||
| ```sql | ||
| {{ config(tags=['feast']) }} | ||
| ``` | ||
|
|
||
| ### "Missing entity column" | ||
|
|
||
| Ensure your dbt model includes the entity column specified with `--entity-column`. Models missing this column are skipped with a warning. | ||
|
|
||
| ### "Missing timestamp column" | ||
|
|
||
| By default, Feast looks for `event_timestamp`. Use `--timestamp-field` to specify a different column name. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.