Bigtable ReadRows doesn't handle empty string column qualifier#4252
Bigtable ReadRows doesn't handle empty string column qualifier#4252dhermes merged 4 commits intogoogleapis:masterfrom
Conversation
| if not cell.family_name: | ||
| cell.family_name = previous.family_name | ||
| if not cell.qualifier: | ||
| if cell.qualifier is None: # Note: qualifier can be empty string |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
dhermes
left a comment
There was a problem hiding this comment.
This mostly looks fine, I'm just trying to figure out how qualifier gets set / when it could ever be None
|
|
||
| These are expected to be updated directly from a | ||
| :class:`._generated.bigtable_service_messages_pb2.ReadRowsResponse` | ||
| :class:`._generated.bigtable_service_messages_pb2.ReadRowsResponse` |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| if not cell.family_name: | ||
| cell.family_name = previous.family_name | ||
| if not cell.qualifier: | ||
| if cell.qualifier is None: # Note: qualifier can be empty string |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
bigtable/tests/unit/test_row_data.py
Outdated
| def _match_results(self, testcase_name, expected_result=_marker): | ||
| chunks, results = self._load_json_test(testcase_name) | ||
| response = _ReadRowsResponseV2(chunks) | ||
|
|
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
bigtable/tests/unit/test_row_data.py
Outdated
| self.assertEqual(cell.row_key, '') | ||
| self.assertEqual(cell.family_name, u'') | ||
| self.assertEqual(cell.qualifier, b'') | ||
| self.assertEqual(cell.qualifier, None) |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
| row_key = '' | ||
| family_name = u'' | ||
| qualifier = b'' | ||
| qualifier = None |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
@garye I think I get it, since |
|
@garye I just noticed CircleCI is trying to run tests for all the packages for you. This is probably because your branch isn't pointing at the latest |
|
Sounds great @dhermes, please rebase when you can. Thanks for the help. |
|
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
|
@garye After rebasing so that the Also, should I change the way we construct a (FWIW, I'd imagine the reason a |
|
Yeah I mentioned in my initial comment that there were failing tests but wanted to get this up for discussion. Yes, please make that change! You're right about the reason for the |
|
I think this latest change fixes the test failures and makes the new test pass. @dhermes, PTAL |
Attempt to fix code to handle both empty string and missing string but tests DO NOT pass.
|
Thanks @garye. I am rebasing and sorting out the docs issue right now. Funny and if I do diff --cc bigtable/google/cloud/bigtable/row_data.py
index ae933a6,8fecb59..13e32c1
--- a/bigtable/google/cloud/bigtable/row_data.py
+++ b/bigtable/google/cloud/bigtable/row_data.py
@@@ -283,10 -283,10 +283,14 @@@ class PartialRowsData(object)
row = self._row = PartialRowData(chunk.row_key)
if cell is None:
++ qual = None
++ if chunk.HasField('qualifier'):
++ qual = chunk.qualifier.value
++
cell = self._cell = PartialCellData(
chunk.row_key,
chunk.family_name.value,
-- chunk.qualifier.value,
++ qual,
chunk.timestamp_micros,
chunk.labels,
chunk.value) |
|
I ran |
One for the record books! Sorry about that :/ |
Luckily it was a 4 line diff 😀 |
|
Thanks for all the help! Any guidance I can give the customer on when this will be released? |
|
@garye Would you like a release for this? |
|
Jinx! I can do one right now, it should be very quick since we just did one 2 days ago so the release notes will be teeny. |
|
Perfect! |
This problem was reported by a customer. As background, the empty string is a valid column qualifier. A missing qualifier, as opposed to the empty string, means the previously seen qualifier should be applied instead.
I added a new test case that has a cell with a column qualifier of empty string arriving after a cell with a non-empty qualifier. The code fails to distinguish between a missing qualifier and an empty qualifier and overwrites the empty string with the first qualifier.
With no code changes the new test case fails. I attempted to fix the bug in
row_data.pybut this causes other failures. I was hoping someone could pick this up and investigate.