Skip to content

RemoteOfflineStore does not support SQL string as entity_df in get_historical_features() #6236

@Witich

Description

@Witich

RemoteOfflineStore does not support SQL string as entity_df in get_historical_features()

Expected Behavior

get_historical_features() should accept a SQL string as entity_df, as documented and supported by local offline stores (ClickHouse, PostgreSQL, BigQuery). The type signature in RemoteOfflineStore already declares Optional[Union[pd.DataFrame, str]].

entity_sql = f"""
    SELECT driver_id, event_timestamp
    FROM {store.get_data_source("driver_hourly_stats_source").get_table_query_string()}
    WHERE event_timestamp BETWEEN '2021-01-01' and '2021-12-31'
"""

training_df = store.get_historical_features(
    entity_df=entity_sql,
    features=["driver_hourly_stats:conv_rate"],
).to_df()

Current Behavior

Passing a SQL string as entity_df to RemoteOfflineStore raises:

AttributeError: 'str' object has no attribute 'columns'

Two functions in feast/infra/offline_stores/remote.py assume entity_df is always a DataFrame:

  1. _create_retrieval_metadata() (line 456) — calls _get_entity_schema(entity_df) which accesses entity_df.columns
  2. _put_parameters() (line 564) — calls pa.Table.from_pandas(entity_df)

Steps to reproduce

  1. Deploy Feast with a remote offline store (Arrow Flight) backed by any store that supports SQL entity_df (ClickHouse, PostgreSQL, etc.)
  2. Run from the client:
from feast import FeatureStore

store = FeatureStore(config=config)  # remote offline store

entity_sql = "SELECT id, event_timestamp FROM my_table WHERE event_timestamp > '2025-01-01'"
job = store.get_historical_features(entity_df=entity_sql, features=["my_fv:feature1"])
df = job.to_df()  # raises AttributeError

Specifications

  • Version: 0.61.0
  • Platform: Linux / macOS
  • Subsystem: feast.infra.offline_stores.remote (RemoteOfflineStore / Arrow Flight)

Possible Solution

Option A — pass SQL via api_parameters:

  • Client (RemoteOfflineStore.get_historical_features): if entity_df is a string, put it into api_parameters["entity_df_sql"] and pass entity_df=None to RemoteRetrievalJob
  • Server (OfflineServer.get_historical_features): if command contains entity_df_sql, forward it as entity_df to the local offline store

Option B — fix _create_retrieval_metadata and _put_parameters:

  • _create_retrieval_metadata: return metadata with empty keys/timestamps when entity_df is a string
  • _put_parameters: encode SQL string in a transport-compatible format (e.g., Flight descriptor command metadata)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions