From df18af14b99c7341bfa5f7b36199dd9b2e269379 Mon Sep 17 00:00:00 2001 From: tokoko Date: Wed, 14 Aug 2024 15:11:40 +0000 Subject: [PATCH 1/3] reorganize registry docs Signed-off-by: tokoko --- docs/SUMMARY.md | 8 +- docs/getting-started/components/registry.md | 55 ++++++++- docs/getting-started/concepts/README.md | 4 - docs/getting-started/concepts/registry.md | 107 ------------------ docs/reference/registries/README.md | 23 ++++ docs/reference/registries/gcs.md | 23 ++++ docs/reference/registries/local.md | 23 ++++ docs/reference/registries/s3.md | 23 ++++ .../{registry => registries}/snowflake.md | 2 +- .../registries/sql.md} | 7 +- 10 files changed, 150 insertions(+), 125 deletions(-) delete mode 100644 docs/getting-started/concepts/registry.md create mode 100644 docs/reference/registries/README.md create mode 100644 docs/reference/registries/gcs.md create mode 100644 docs/reference/registries/local.md create mode 100644 docs/reference/registries/s3.md rename docs/reference/{registry => registries}/snowflake.md (97%) rename docs/{tutorials/using-scalable-registry.md => reference/registries/sql.md} (97%) diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 3cc2511288f..734861c5390 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -22,7 +22,6 @@ * [Feature view](getting-started/concepts/feature-view.md) * [Feature retrieval](getting-started/concepts/feature-retrieval.md) * [Point-in-time joins](getting-started/concepts/point-in-time-joins.md) - * [Registry](getting-started/concepts/registry.md) * [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md) * [Components](getting-started/components/README.md) * [Overview](getting-started/components/overview.md) @@ -42,7 +41,6 @@ * [Real-time credit scoring on AWS](tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md) * [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md) * [Validating historical features with Great Expectations](tutorials/validating-historical-features.md) -* [Using Scalable Registry](tutorials/using-scalable-registry.md) * [Building streaming features](tutorials/building-streaming-features.md) ## How-to Guides @@ -111,6 +109,12 @@ * [Hazelcast (contrib)](reference/online-stores/hazelcast.md) * [ScyllaDB (contrib)](reference/online-stores/scylladb.md) * [SingleStore (contrib)](reference/online-stores/singlestore.md) +* [Registries](reference/registries/README.md) + * [Local](reference/registries/local.md) + * [S3](reference/registries/s3.md) + * [GCS](reference/registries/gcs.md) + * [SQL](reference/registries/sql.md) + * [Snowflake](reference/registries/snowflake.md) * [Providers](reference/providers/README.md) * [Local](reference/providers/local.md) * [Google Cloud Platform](reference/providers/google-cloud-platform.md) diff --git a/docs/getting-started/components/registry.md b/docs/getting-started/components/registry.md index 0939fb53fcf..93e17248145 100644 --- a/docs/getting-started/components/registry.md +++ b/docs/getting-started/components/registry.md @@ -1,15 +1,60 @@ # Registry -The Feast feature registry is a central catalog of all the feature definitions and their related metadata. It allows data scientists to search, discover, and collaborate on new features. +Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). It allows data scientists to search, discover, and collaborate on new features. The registry exposes methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations. -Each Feast deployment has a single feature registry. Feast only supports file-based registries today, but supports four different backends. +Feast comes with built-in file-based and sql-based registry implementations. By default, Feast uses a file-based registry, which stores the protobuf representation of the registry as a serialized file in the local file system. For more details on which registries are supported, please see [Registries](../../reference/registries/). + +## Updating the registry + +We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD +automatically stays synced with the registry. Users will often also want multiple registries to correspond to +different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write +access since they can impact real user traffic. See [Running Feast in Production](../../how-to-guides/running-feast-in-production.md#1.-automatically-deploying-changes-to-your-feature-definitions) for details on how to set this up. + +## Accessing the registry from clients + +Users can specify the registry through a `feature_store.yaml` config file, or programmatically. We often see teams +preferring the programmatic approach because it makes notebook driven development very easy: + +### Option 1: programmatically specifying the registry + +```python +repo_config = RepoConfig( + registry=RegistryConfig(path="gs://feast-test-gcs-bucket/registry.pb"), + project="feast_demo_gcp", + provider="gcp", + offline_store="file", # Could also be the OfflineStoreConfig e.g. FileOfflineStoreConfig + online_store="null", # Could also be the OnlineStoreConfig e.g. RedisOnlineStoreConfig +) +store = FeatureStore(config=repo_config) +``` + +### Option 2: specifying the registry in the project's `feature_store.yaml` file + +```yaml +project: feast_demo_aws +provider: aws +registry: s3://feast-test-s3-bucket/registry.pb +online_store: null +offline_store: + type: file +``` + +Instantiating a `FeatureStore` object can then point to this: + +```python +store = FeatureStore(repo_path=".") +``` + + -The feature registry is updated during different operations when using Feast. More specifically, objects within the registry \(entities, feature views, feature services\) are updated when running `apply` from the Feast CLI, but metadata about objects can also be updated during operations like materialization. + + diff --git a/docs/getting-started/concepts/README.md b/docs/getting-started/concepts/README.md index e805e3b4867..eddddf4e711 100644 --- a/docs/getting-started/concepts/README.md +++ b/docs/getting-started/concepts/README.md @@ -24,10 +24,6 @@ [point-in-time-joins.md](point-in-time-joins.md) {% endcontent-ref %} -{% content-ref url="registry.md" %} -[registry.md](registry.md) -{% endcontent-ref %} - {% content-ref url="dataset.md" %} [dataset.md](dataset.md) {% endcontent-ref %} diff --git a/docs/getting-started/concepts/registry.md b/docs/getting-started/concepts/registry.md deleted file mode 100644 index 8ac32ce87b9..00000000000 --- a/docs/getting-started/concepts/registry.md +++ /dev/null @@ -1,107 +0,0 @@ -# Registry - -Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). The registry exposes -methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations. - -### Options for registry implementations - -#### File-based registry -By default, Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as -a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS, or Azure). - -The quickstart guides that use `feast init` will use a registry on a local file system. To allow Feast to configure -a remote file registry, you need to create a GCS / S3 bucket that Feast can understand: -{% tabs %} -{% tab title="Example S3 file registry" %} -```yaml -project: feast_demo_aws -provider: aws -registry: - path: s3://[YOUR BUCKET YOU CREATED]/registry.pb - cache_ttl_seconds: 60 -online_store: null -offline_store: - type: file -``` -{% endtab %} - -{% tab title="Example GCS file registry" %} -```yaml -project: feast_demo_gcp -provider: gcp -registry: - path: gs://[YOUR BUCKET YOU CREATED]/registry.pb - cache_ttl_seconds: 60 -online_store: null -offline_store: - type: file -``` -{% endtab %} -{% endtabs %} - -However, there are inherent limitations with a file-based registry, since changing a single field in the registry -requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or -bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for -multiple feature views or time ranges concurrently). - -#### SQL Registry -Alternatively, a [SQL Registry](../../tutorials/using-scalable-registry.md) can be used for a more scalable registry. - -The configuration roughly looks like: -```yaml -project: -provider: -online_store: redis -offline_store: file -registry: - registry_type: sql - path: postgresql://postgres:mysecretpassword@127.0.0.1:55001/feast - cache_ttl_seconds: 60 - sqlalchemy_config_kwargs: - echo: false - pool_pre_ping: true -``` - -This supports any SQLAlchemy compatible database as a backend. The exact schema can be seen in [sql.py](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/registry/sql.py) - -### Updating the registry - -We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD -automatically stays synced with the registry. Users will often also want multiple registries to correspond to -different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write -access since they can impact real user traffic. See [Running Feast in Production](../../how-to-guides/running-feast-in-production.md#1.-automatically-deploying-changes-to-your-feature-definitions) for details on how to set this up. - -### Accessing the registry from clients - -Users can specify the registry through a `feature_store.yaml` config file, or programmatically. We often see teams -preferring the programmatic approach because it makes notebook driven development very easy: - -#### Option 1: programmatically specifying the registry - -```python -repo_config = RepoConfig( - registry=RegistryConfig(path="gs://feast-test-gcs-bucket/registry.pb"), - project="feast_demo_gcp", - provider="gcp", - offline_store="file", # Could also be the OfflineStoreConfig e.g. FileOfflineStoreConfig - online_store="null", # Could also be the OnlineStoreConfig e.g. RedisOnlineStoreConfig -) -store = FeatureStore(config=repo_config) -``` - -#### Option 2: specifying the registry in the project's `feature_store.yaml` file - -```yaml -project: feast_demo_aws -provider: aws -registry: s3://feast-test-s3-bucket/registry.pb -online_store: null -offline_store: - type: file -``` - -Instantiating a `FeatureStore` object can then point to this: - -```python -store = FeatureStore(repo_path=".") -``` \ No newline at end of file diff --git a/docs/reference/registries/README.md b/docs/reference/registries/README.md new file mode 100644 index 00000000000..1310506f1d3 --- /dev/null +++ b/docs/reference/registries/README.md @@ -0,0 +1,23 @@ +# Registies + +Please see [Registry](../../getting-started/architecture-and-components/registry.md) for a conceptual explanation of registries. + +{% content-ref url="local.md" %} +[local.md](local.md) +{% endcontent-ref %} + +{% content-ref url="s3.md" %} +[s3.md](s3.md) +{% endcontent-ref %} + +{% content-ref url="gcs.md" %} +[gcs.md](gcs.md) +{% endcontent-ref %} + +{% content-ref url="sql.md" %} +[sql.md](sql.md) +{% endcontent-ref %} + +{% content-ref url="snowflake.md" %} +[snowflake.md](snowflake.md) +{% endcontent-ref %} diff --git a/docs/reference/registries/gcs.md b/docs/reference/registries/gcs.md new file mode 100644 index 00000000000..13c9657aa13 --- /dev/null +++ b/docs/reference/registries/gcs.md @@ -0,0 +1,23 @@ +# GCS Registry + +## Description + +GCS registry provides support for storing the protobuf representation of your feature store objects (data sources, feature views, feature services, etc.) uing Google Cloud Storage. + +While it can be used in production, there are still inherent limitations with a file-based registries, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently). + +An example of how to configure this would be: + +## Example + +{% code title="feature_store.yaml" %} +```yaml +project: feast_gcp +registry: + path: gs://[YOUR BUCKET YOU CREATED]/registry.pb + cache_ttl_seconds: 60 +online_store: null +offline_store: + type: dask +``` +{% endcode %} \ No newline at end of file diff --git a/docs/reference/registries/local.md b/docs/reference/registries/local.md new file mode 100644 index 00000000000..ad1d98cea99 --- /dev/null +++ b/docs/reference/registries/local.md @@ -0,0 +1,23 @@ +# Local Registry + +## Description + +Local registry provides support for storing the protobuf representation of your feature store objects (data sources, feature views, feature services, etc.) in local file system. It is only intended to be used for experimentation with Feast and should not be used in production. + +There are inherent limitations with a file-based registries, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently). + +An example of how to configure this would be: + +## Example + +{% code title="feature_store.yaml" %} +```yaml +project: feast_local +registry: + path: registry.pb + cache_ttl_seconds: 60 +online_store: null +offline_store: + type: dask +``` +{% endcode %} \ No newline at end of file diff --git a/docs/reference/registries/s3.md b/docs/reference/registries/s3.md new file mode 100644 index 00000000000..65069c415c5 --- /dev/null +++ b/docs/reference/registries/s3.md @@ -0,0 +1,23 @@ +# S3 Registry + +## Description + +S3 registry provides support for storing the protobuf representation of your feature store objects (data sources, feature views, feature services, etc.) in S3 file system. + +While it can be used in production, there are still inherent limitations with a file-based registries, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently). + +An example of how to configure this would be: + +## Example + +{% code title="feature_store.yaml" %} +```yaml +project: feast_aws_s3 +registry: + path: s3://[YOUR BUCKET YOU CREATED]/registry.pb + cache_ttl_seconds: 60 +online_store: null +offline_store: + type: dask +``` +{% endcode %} \ No newline at end of file diff --git a/docs/reference/registry/snowflake.md b/docs/reference/registries/snowflake.md similarity index 97% rename from docs/reference/registry/snowflake.md rename to docs/reference/registries/snowflake.md index 31b0db95824..00d87b19775 100644 --- a/docs/reference/registry/snowflake.md +++ b/docs/reference/registries/snowflake.md @@ -1,4 +1,4 @@ -# Snowflake registry +# Snowflake Registry ## Description diff --git a/docs/tutorials/using-scalable-registry.md b/docs/reference/registries/sql.md similarity index 97% rename from docs/tutorials/using-scalable-registry.md rename to docs/reference/registries/sql.md index 25746f60e23..631a20cbe3c 100644 --- a/docs/tutorials/using-scalable-registry.md +++ b/docs/reference/registries/sql.md @@ -1,9 +1,4 @@ ---- -description: >- - Tutorial on how to use the SQL registry for scalable registry updates ---- - -# Using Scalable Registry +# SQL Registry ## Overview From 3384db7795ba0bca6d84ec698a90c75d538b8d4e Mon Sep 17 00:00:00 2001 From: tokoko Date: Wed, 14 Aug 2024 15:43:31 +0000 Subject: [PATCH 2/3] remove commented out text Signed-off-by: tokoko --- docs/getting-started/components/registry.md | 29 --------------------- 1 file changed, 29 deletions(-) diff --git a/docs/getting-started/components/registry.md b/docs/getting-started/components/registry.md index 93e17248145..a564b0e56a4 100644 --- a/docs/getting-started/components/registry.md +++ b/docs/getting-started/components/registry.md @@ -45,32 +45,3 @@ Instantiating a `FeatureStore` object can then point to this: ```python store = FeatureStore(repo_path=".") ``` - - - - - From 7e450274486a72cc3a18a0bdd31649318087972e Mon Sep 17 00:00:00 2001 From: tokoko Date: Thu, 22 Aug 2024 10:38:00 +0000 Subject: [PATCH 3/3] changes in registry.md Signed-off-by: tokoko --- docs/getting-started/components/registry.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/getting-started/components/registry.md b/docs/getting-started/components/registry.md index a564b0e56a4..0c85c5ad36b 100644 --- a/docs/getting-started/components/registry.md +++ b/docs/getting-started/components/registry.md @@ -1,6 +1,6 @@ # Registry -Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). It allows data scientists to search, discover, and collaborate on new features. The registry exposes methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations. +The Feast feature registry is a central catalog of all feature definitions and their related metadata. Feast uses the registry to store all applied Feast objects (e.g. Feature views, entities, etc). It allows data scientists to search, discover, and collaborate on new features. The registry exposes methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations. Feast comes with built-in file-based and sql-based registry implementations. By default, Feast uses a file-based registry, which stores the protobuf representation of the registry as a serialized file in the local file system. For more details on which registries are supported, please see [Registries](../../reference/registries/). @@ -45,3 +45,7 @@ Instantiating a `FeatureStore` object can then point to this: ```python store = FeatureStore(repo_path=".") ``` + +{% hint style="info" %} +The file-based feature registry is a [Protobuf representation](https://github.com/feast-dev/feast/blob/master/protos/feast/core/Registry.proto) of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry. +{% endhint %} \ No newline at end of file