Skip to content

dbt integration: Validate entity column data type is appropriate #5876

@YassinNouh21

Description

@YassinNouh21

Context

PR #5827 added dbt integration that creates Entity objects from dbt model columns.

Problem

No validation that the entity column has an appropriate data type for use as an entity key. Entity keys should typically be:

  • STRING / VARCHAR
  • INT / INT64 / BIGINT
  • UUID (if supported)

But the code would accept any column type including:

  • FLOAT / DOUBLE (non-deterministic for joins)
  • BYTES (not suitable for entity keys)
  • TIMESTAMP (rarely appropriate)

Current Behavior

# In dbt_import.py:191-197
if entity_column not in column_names:
    click.echo(warning)
    continue
# No type checking!

Proposed Solution

Add validation and warning:

entity_col = next((c for c in model.columns if c.name == entity_column), None)
if entity_col:
    normalized_type = entity_col.data_type.upper()
    valid_entity_types = ['STRING', 'TEXT', 'VARCHAR', 'INT', 'INT32', 'INT64', 'INTEGER', 'BIGINT', 'UUID']
    
    if not any(t in normalized_type for t in valid_entity_types):
        click.echo(
            f"{Fore.YELLOW}Warning: Entity column '{entity_column}' has type "
            f"'{entity_col.data_type}' which may not be suitable for entity keys."
            f" Recommended types: STRING, INT64{Style.RESET_ALL}"
        )

Edge Cases to Handle

  • FLOAT columns (should warn strongly)
  • ARRAY columns (invalid for entities)
  • Complex/nested types (invalid)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions