bigframes.bigquery#

Access BigQuery-specific operations and namespaces within BigQuery DataFrames.

This module provides specialized functions and sub-modules that expose BigQuery’s advanced capabilities to DataFrames and Series. It acts as a bridge between the pandas-compatible API and the full power of BigQuery SQL.

Key sub-modules include:

This module also provides direct access to optimized BigQuery functions for:

  • JSON Processing: High-performance functions like json_extract, json_value, and parse_json for handling semi-structured data.

  • Geospatial Analysis: Comprehensive geographic functions such as st_area, st_distance, and st_centroid (ST_ prefixed functions).

  • Array Operations: Tools for working with BigQuery arrays, including array_agg and array_length.

  • Vector Search: Integration with BigQuery’s vector search and indexing capabilities for high-dimensional data.

  • Custom SQL: The sql_scalar function allows embedding raw SQL snippets for advanced operations not yet directly mapped in the API.

By using these functions, you can leverage BigQuery’s high-performance engine for domain-specific tasks while maintaining a Python-centric development experience.

For the full list of BigQuery standard SQL functions, see: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-reference

Functions

approx_top_count(series, number)

Returns the approximate top elements of expression as an array of STRUCTs.

array_agg(obj)

Group data and create arrays from selected columns, omitting NULLs to avoid BigQuery errors (NULLs not allowed in arrays).

array_length(series)

Compute the length of each array element in the Series.

array_to_string(series, delimiter)

Converts array elements within a Series into delimited strings.

create_external_table(table_name, *[, ...])

Creates a BigQuery external table.

create_vector_index(table_id, column_name, *)

Creates a new vector index on a column of a table.

json_extract(input, json_path)

Extracts a JSON value and converts it to a SQL JSON-formatted STRING or JSON value.

json_extract_array(input[, json_path])

Extracts a JSON array and converts it to a SQL array of JSON-formatted STRING or JSON values.

json_extract_string_array(input[, ...])

Extracts a JSON array and converts it to a SQL array of STRING values.

json_keys(input[, max_depth])

Returns all keys in the root of a JSON object as an ARRAY of STRINGs.

json_query(input, json_path)

Extracts a JSON value and converts it to a SQL JSON-formatted STRING or JSON value.

json_query_array(input[, json_path])

Extracts a JSON array and converts it to a SQL array of JSON-formatted STRING or JSON values.

json_set(input, json_path_value_pairs)

Produces a new JSON value within a Series by inserting or replacing values at specified paths.

json_value(input[, json_path])

Extracts a JSON scalar value and converts it to a SQL STRING value.

json_value_array(input[, json_path])

Extracts a JSON array of scalar values and converts it to a SQL ARRAY<STRING> value.

load_data(table_name, *[, ...])

Loads data into a BigQuery table.

parse_json(input)

Converts a series with a JSON-formatted STRING value to a JSON value.

rand()

Generates a pseudo-random value of type FLOAT64 in the range of [0, 1), inclusive of 0 and exclusive of 1.

sql_scalar(sql_template, columns, *[, ...])

Create a Series from a SQL template.

st_area(series)

Returns the area in square meters covered by the polygons in the input GEOGRAPHY.

st_buffer(series, buffer_radius[, ...])

Computes a GEOGRAPHY that represents all points whose distance from the input GEOGRAPHY is less than or equal to distance meters.

st_centroid(series)

Computes the geometric centroid of a GEOGRAPHY type.

st_convexhull(series)

Computes the convex hull of a GEOGRAPHY type.

st_difference(series, other)

Returns a GEOGRAPHY that represents the point set difference of geography_1 and geography_2.

st_distance(series, other, *[, use_spheroid])

Returns the shortest distance in meters between two non-empty GEOGRAPHY objects.

st_intersection(series, other)

Returns a GEOGRAPHY that represents the point set intersection of the two input GEOGRAPHYs.

st_isclosed(series)

Returns TRUE for a non-empty Geography, where each element in the Geography has an empty boundary.

st_length(series, *[, use_spheroid])

Returns the total length in meters of the lines in the input GEOGRAPHY.

st_regionstats(geography, raster_id[, band, ...])

Returns statistics summarizing the pixel values of the raster image referenced by raster_id that intersect with geography.

st_simplify(geography, tolerance_meters)

Returns a simplified version of the input geography.

struct(value)

Takes a DataFrame and converts it into a Series of structs with each struct entry corresponding to a DataFrame row and each struct field corresponding to a DataFrame column

to_json(input)

Converts a series with a JSON value to a JSON-formatted STRING value.

to_json_string(input)

Converts a series to a JSON-formatted STRING value.

unix_micros(input)

Converts a timestmap series to unix epoch microseconds

unix_millis(input)

Converts a timestmap series to unix epoch milliseconds

unix_seconds(input)

Converts a timestmap series to unix epoch seconds

vector_search(base_table, column_to_search, ...)

Conduct vector search which searches embeddings to find semantically similar entities.