Skip to content

feat: Add bigframes.execution_history API to track BigQuery jobs#2435

Open
shuoweil wants to merge 1 commit intomainfrom
shuowei-job-history
Open

feat: Add bigframes.execution_history API to track BigQuery jobs#2435
shuoweil wants to merge 1 commit intomainfrom
shuowei-job-history

Conversation

@shuoweil
Copy link
Copy Markdown
Contributor

@shuoweil shuoweil commented Feb 5, 2026

This PR promotes execution_history() to the top-level bigframes namespace and upgrades it to track rich metadata for every BigQuery job executed during your session.

Key User Benefits:

  • Easier Access: Call bigframes.execution_history() directly instead of digging into sub-namespaces.

  • Rich Metadata Tracking: Captures structured statistics for both Query Jobs and Load Jobs including:

    • job_id and a direct Google Cloud Console URL for easy debugging.
    • Performance metrics: total_bytes_processed, duration_seconds, and slot_millis.
    • Query details (truncated preview of the SQL ran).
  • Clean, Focused Logs: Automatically filters out internal library overhead (like schema validations and index uniqueness checks) so your history only shows the data processing steps you actually care about.

    Usage Example:

    1 import bigframes.pandas as bpd
    2 import pandas as pd
    3 import bigframes
    4
    5 # ... run some bigframes operations ...
    6 df = bpd.read_gbq("SELECT 1")
    7
    8 # Upload some local data (triggers a Load Job)
    9 bpd.read_pandas(pd.DataFrame({'a': [1, 2, 3]}))
   10
   11 # Get a DataFrame of all BQ jobs run in this session
   12 history = bigframes.execution_history()
   13
   14 # Inspect recent queries, their costs, and durations
   15 print(history[['job_id', 'job_type', 'total_bytes_processed', 'duration_seconds', 'query']])

verified at:
vs code notebook: screen/8u2yhaRV9iHbDbF
colab notebook: screen/9L8VrP5y9DXhnZz

Fixes #<481840739> 🦕

@shuoweil shuoweil self-assigned this Feb 5, 2026
@shuoweil shuoweil requested a review from a team as a code owner February 5, 2026 05:40
@shuoweil shuoweil requested a review from a team February 5, 2026 05:40
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 5, 2026


@dataclasses.dataclass
class JobMetadata:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a static factory method to build this from an sdk query job object?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Added a from_job classmethod (and from_row_iterator) to handle building the metadata object directly from the jobs.

error_result: Optional[Mapping[str, Any]] = None
cached: Optional[bool] = None
job_url: Optional[str] = None
query: Optional[str] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do worry that at a certain point, storing all query test generated by the session might clog up memory?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! To prevent memory bloat during long sessions, I have added truncation so we cap the stored query text strings at a maximum of 1024 characters.

Copy link
Copy Markdown
Contributor

@sycai sycai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the concern of placing job_history under the bigframes.pandas package. We may consider bigframes or session instances as the residing places, mainly because functionalities under bigframes.ml and bigframes.bigquery can also trigger jobs but they do not belong to bigframes.pandas.

@shuoweil shuoweil marked this pull request as draft February 5, 2026 19:36
@chalmerlowe
Copy link
Copy Markdown

Migration Notice: This library is moving to the google-cloud-python monorepo soon.

We closed this PR due to inactivity to ensure a clean migration. Please re-open this work in the new monorepo once the migration is complete!

@chalmerlowe chalmerlowe closed this Mar 2, 2026
@shuoweil shuoweil reopened this Mar 23, 2026
@shuoweil shuoweil force-pushed the shuowei-job-history branch from 2fbbfa1 to 8d3b0c5 Compare March 23, 2026 22:59
@product-auto-label product-auto-label bot added size: l Pull request size is large. size: xl Pull request size is extra large. and removed size: xl Pull request size is extra large. size: l Pull request size is large. labels Mar 23, 2026
@shuoweil
Copy link
Copy Markdown
Contributor Author

I have the concern of placing job_history under the bigframes.pandas package. We may consider bigframes or session instances as the residing places, mainly because functionalities under bigframes.ml and bigframes.bigquery can also trigger jobs but they do not belong to bigframes.pandas.

I agree with you. I have fully moved it out of bf.pandas. The API is now renamed to execution_history() to better reflect the broadened abstraction and is directly exposed via the root module (bigframes.execution_history()) and on the Session instance.

@shuoweil shuoweil force-pushed the shuowei-job-history branch from 5012dd1 to 570375f Compare March 31, 2026 01:52
@shuoweil shuoweil changed the title feat: Add bigframes.pandas.job_history() API to track BigQuery jobs feat: Add bigframes.execution_history API to track BigQuery jobs Mar 31, 2026
@shuoweil shuoweil force-pushed the shuowei-job-history branch from 211370d to 5c5bdde Compare April 1, 2026 03:18
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Apr 1, 2026
@shuoweil shuoweil requested review from TrevorBergeron and sycai April 1, 2026 03:21
@shuoweil shuoweil marked this pull request as ready for review April 1, 2026 03:21
@shuoweil shuoweil requested review from a team as code owners April 1, 2026 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants