BigQuery: ensure that KeyboardInterrupt during to_dataframeno longer hangs.#7698
Merged
tswast merged 2 commits intogoogleapis:masterfrom Apr 22, 2019
Merged
BigQuery: ensure that KeyboardInterrupt during to_dataframeno longer hangs.#7698tswast merged 2 commits intogoogleapis:masterfrom
KeyboardInterrupt during to_dataframeno longer hangs.#7698tswast merged 2 commits intogoogleapis:masterfrom
Conversation
tswast
commented
Apr 12, 2019
Contributor
Author
|
✅ Note to self: Need to update the minimum |
0971781 to
6fc4af2
Compare
fa7bb05 to
9016469
Compare
KeyboardInterrupt during to_dataframe (with BQ Storage API) no longer hangsKeyboardInterrupt during to_dataframeno longer hangs.
8593692 to
44186b6
Compare
KeyboardInterrupt during to_dataframeno longer hangs.KeyboardInterrupt during to_dataframeno longer hangs.
shollyman
reviewed
Apr 19, 2019
| return rowstream.to_dataframe(session, dtypes=dtypes) | ||
| # Use _to_dataframe_finished to notify worker threads when to quit. | ||
| # See: https://stackoverflow.com/a/29237343/101923 | ||
| self._to_dataframe_finished = False |
Contributor
There was a problem hiding this comment.
Mostly I'm trying to reason about scope here. This is a worker pool, but is it possible we're generating multiple independent dataframes that would share the same access to _to_dataframe_finished? Do we need to key the workers to specific invocations, or is the nature of the access always blocking so that this isn't an issue?
Contributor
Author
There was a problem hiding this comment.
There are a couple reasons that I don't think it's an issue:
- Yes,
to_dataframe()is a blocking call, so you wouldn't really have multiple going at once. RowIteratorisn't really something you'd want to use across threads or even more than once anyway. Because of how the pagination state works, once you loop through all the rows once,to_dataframe()returns an empty DataFrame, even in the case where BQ Storage API isn't used.
…no longer hangs I noticed in manually testing `to_dataframe` that it would stop the current cell when I hit Ctrl-C, but data kept on downloading in the background. Trying to exit the Python shell, I'd notice that it would hang until I pressed Ctrl-C a few more times. Rather than get the DataFrame for each stream in one big chunk, loop through each block and exit if the function needs to quit early. This follows the pattern at https://stackoverflow.com/a/29237343/101923 Update tests to ensure multiple progress interval loops.
44186b6 to
bf73284
Compare
shollyman
approved these changes
Apr 22, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I noticed in manually testing
to_dataframethat it would stopthe current cell when I hit Ctrl-C, but data kept on downloading in the
background. Trying to exit the Python shell, I'd notice that it would
hang until I pressed Ctrl-C a few more times.
Rather than get the DataFrame for each stream in one big chunk, loop
through each block and exit if the function needs to quit early. This
follows the pattern at https://stackoverflow.com/a/29237343/101923
This depends on:
Add page iterator to ReadRowsStream #7680, new.pagesfeature in BQ Storage client