Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,39 @@ It aims to:
* Enable discovery and documentation of features
* Provide an overview of the general health of features in the system

## High Level Architecture

![Feast Architecture](docs/architecture.png)

The Feast platform is broken down into the following functional areas:

* __Create__ features based on defined format and programming model
* __Ingest__ features via streaming input, import from files or BigQuery tables, and write to an appropriate data store
* __Store__ feature data for both serving and training purposes based on feature access patterns
* __Access__ features for training and serving
* __Discover__ information about entities and features stored and served by Feast

## Motivation

__Access to features in serving__: Machine learning models typically require access to features created in both batch pipelines, and real time streams. Feast provides a means for accessing these features in a serving environment, at low latency and high load.

__Consistency between training and serving__: In many machine learning systems there exists a disconnect between features that are created in batch pipelines for the training of a model, and ones that are created from streams for the serving of real-time features. By centralizing the ingestion of features, Feast provides a consistent view of both batch and real-time features, in both training and serving.

__Infrastructure management__: Feast abstracts away much of the engineering overhead associated with managing data infrastructure. It handles the ingestion, storage, and serving of large amount of feature data in a scalable way. The system configures data models based on your registered feature specifications, and ensures that you always have a consistent view of features in both your historical and real-time data stores.

__Feature standardisation__: Feast presents a centralized platform on which teams can register features in a standardized way using specifications. This provides structure to the way features are defined and allows teams to reference features in discussions with a singly understood link.

__Discovery__: Feast allows users to easily explore and discover features and their associated information. This allows for a deeper understanding of features and theirs specifications, more feature reuse between teams and projects, and faster experimentation. Each new ML project can leverage features that have been created by prior teams, which compounds an organization's ability to discover new insights.

## More Information

* [Components](docs/components.md)
* [Concepts](docs/concepts.md)

## Notice

Feast is still under active development. Your feedback and contributions are important to us.


## Source Code Headers

Every file containing source code must include copyright and license
Expand Down
Binary file added docs/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions docs/components.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@

# Components

### Core

Feast Core is the central component that manages Feast and all other components within the system. It allows for the registration and management of entities, features, data stores, and other system resources. Core also manages the execution of feature ingestion jobs from batch and streaming sources, and provides the other Feast components with feature related information.

### Stores

Feast maintains data stores for the purposes of model training and serving features to models in production. Features are loaded into these stores by ingestion jobs from both streaming and batch sources.

Two kinds of data stores are supported:

Warehouse: The feature warehouse maintains all historical feature data. The warehouse can be queried for batch datasets which are then used for model training.

Supported warehouse: __BigQuery__

Serving: Feast supports multiple serving stores which maintain feature values for access in a production serving environment.

Supported serving stores: __Redis__, __Bigtable__

### Serving

Feast Serving is an API used for for the retrieval of feature values by models in production. It allows for low latency and high throughput access to feature values from serving stores using Feast client libraries. The API abstracts away data access, allowing users to simultaneously query from multiple stores with a single gRPC or HTTP request.

### Client Libraries

Feast provides multiple client libraries for interacting with a Feast deployment.

| Functionality | CLI | Go | Java | Python (WIP)|
|------------------------------|-----|-----|------|-------------|
| Feature Management | yes | no | no | yes |
| Data Ingestion (Jobs) | yes | no | no | yes |
| Feature Retrieval (Training) | no | no | no | yes |
| Feature Retrieval (Serving) | no | yes | yes | yes |
31 changes: 31 additions & 0 deletions docs/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Concepts

### What is Feast?
Feast is a Feature Storage platform for Machine Learning features with the following attributes:

1. Ingestion and storage of ML features via batch or stream
2. Retrieval of ML features for serving via API, or via Google BigQuery to create training datasets
3. Maintaining of a feature catalog, including additional feature attribute information and discovery via API

Feast solves a need for standardising how features are stored, served and accessed, and encourages sharing and reuse of created features amongst data science teams.

Feast does not prescribe how Features should be created. It allows for ingestion via batch or stream in a number of formats, e.g. batch import from CSV, BigQuery tables, streaming via Pub/Sub etc.


### What is a Feature?

A Feature is an individual measurable property or characteristic of an Entity. In the context of Feast a Feature has the following attributes:

* Entity - It must be associated with a known Entity within Feast
* ValueType - The feature type must be defined, e.g. String, Bytes, Int64, Int32, Float etc.
* Requirements - Properties related to how a feature should be stored for serving and training
* Granularity - Time series features require a defined granularity
* StorageType - For both serving and training a storage type must be defined

Feast needs to know these attributes in order to be able to ingest, store and serve a feature. A Feature is only a feature when Feast knows about it; This seems contrite, but it introduces a best practice whereby a feature only becomes available for ingestion, serving and training in production when Feast has added the feature to its catalog.

### What is an Entity?

An entity is a type with an associated key which generally maps onto a known domain object, e.g. Driver, Customer, Area, Merchant etc. An entity can also be a composite of other entities, with the corresponding composite key, e.g. DriverArea.

An entity determines how a feature may be retrieved. e.g. for a Driver entity all driver features must be looked up with an associated driver id entity key.