feat: Add bigframes.execution_history API to track BigQuery jobs#2435
feat: Add bigframes.execution_history API to track BigQuery jobs#2435
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
|
||
|
|
||
| @dataclasses.dataclass | ||
| class JobMetadata: |
There was a problem hiding this comment.
can we add a static factory method to build this from an sdk query job object?
There was a problem hiding this comment.
Done! Added a from_job classmethod (and from_row_iterator) to handle building the metadata object directly from the jobs.
| error_result: Optional[Mapping[str, Any]] = None | ||
| cached: Optional[bool] = None | ||
| job_url: Optional[str] = None | ||
| query: Optional[str] = None |
There was a problem hiding this comment.
I do worry that at a certain point, storing all query test generated by the session might clog up memory?
There was a problem hiding this comment.
Good point! To prevent memory bloat during long sessions, I have added truncation so we cap the stored query text strings at a maximum of 1024 characters.
sycai
left a comment
There was a problem hiding this comment.
I have the concern of placing job_history under the bigframes.pandas package. We may consider bigframes or session instances as the residing places, mainly because functionalities under bigframes.ml and bigframes.bigquery can also trigger jobs but they do not belong to bigframes.pandas.
|
Migration Notice: This library is moving to the google-cloud-python monorepo soon. We closed this PR due to inactivity to ensure a clean migration. Please re-open this work in the new monorepo once the migration is complete! |
2fbbfa1 to
8d3b0c5
Compare
I agree with you. I have fully moved it out of bf.pandas. The API is now renamed to execution_history() to better reflect the broadened abstraction and is directly exposed via the root module (bigframes.execution_history()) and on the Session instance. |
5012dd1 to
570375f
Compare
211370d to
5c5bdde
Compare
This PR promotes execution_history() to the top-level bigframes namespace and upgrades it to track rich metadata for every BigQuery job executed during your session.
Key User Benefits:
Easier Access: Call bigframes.execution_history() directly instead of digging into sub-namespaces.
Rich Metadata Tracking: Captures structured statistics for both Query Jobs and Load Jobs including:
Clean, Focused Logs: Automatically filters out internal library overhead (like schema validations and index uniqueness checks) so your history only shows the data processing steps you actually care about.
Usage Example:
verified at:
vs code notebook: screen/8u2yhaRV9iHbDbF
colab notebook: screen/9L8VrP5y9DXhnZz
Fixes #<481840739> 🦕