Skip to content

Bigtable ReadRows doesn't handle empty string column qualifier#4252

Merged
dhermes merged 4 commits intogoogleapis:masterfrom
garye:master
Nov 2, 2017
Merged

Bigtable ReadRows doesn't handle empty string column qualifier#4252
dhermes merged 4 commits intogoogleapis:masterfrom
garye:master

Conversation

@garye
Copy link
Contributor

@garye garye commented Oct 25, 2017

This problem was reported by a customer. As background, the empty string is a valid column qualifier. A missing qualifier, as opposed to the empty string, means the previously seen qualifier should be applied instead.

I added a new test case that has a cell with a column qualifier of empty string arriving after a cell with a non-empty qualifier. The code fails to distinguish between a missing qualifier and an empty qualifier and overwrites the empty string with the first qualifier.

With no code changes the new test case fails. I attempted to fix the bug in row_data.py but this causes other failures. I was hoping someone could pick this up and investigate.

@garye garye requested a review from lukesneeringer as a code owner October 25, 2017 14:12
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Oct 25, 2017
if not cell.family_name:
cell.family_name = previous.family_name
if not cell.qualifier:
if cell.qualifier is None: # Note: qualifier can be empty string

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

Copy link
Contributor

@dhermes dhermes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks fine, I'm just trying to figure out how qualifier gets set / when it could ever be None


These are expected to be updated directly from a
:class:`._generated.bigtable_service_messages_pb2.ReadRowsResponse`
:class:`._generated.bigtable_service_messages_pb2.ReadRowsResponse`

This comment was marked as spam.

This comment was marked as spam.

if not cell.family_name:
cell.family_name = previous.family_name
if not cell.qualifier:
if cell.qualifier is None: # Note: qualifier can be empty string

This comment was marked as spam.

def _match_results(self, testcase_name, expected_result=_marker):
chunks, results = self._load_json_test(testcase_name)
response = _ReadRowsResponseV2(chunks)

This comment was marked as spam.

This comment was marked as spam.

self.assertEqual(cell.row_key, '')
self.assertEqual(cell.family_name, u'')
self.assertEqual(cell.qualifier, b'')
self.assertEqual(cell.qualifier, None)

This comment was marked as spam.

row_key = ''
family_name = u''
qualifier = b''
qualifier = None

This comment was marked as spam.

@dhermes
Copy link
Contributor

dhermes commented Oct 25, 2017

@garye I think I get it, since ReadRowsResponse.CellChunk.qualifier is a google.protobuf.BytesValue, it's a message field rather than a scalar field, so None is an acceptable default. (I'm still trying to figure out where a None value gets passed into the PartialCellData constructor.)

@dhermes
Copy link
Contributor

dhermes commented Oct 25, 2017

@garye I just noticed CircleCI is trying to run tests for all the packages for you. This is probably because your branch isn't pointing at the latest HEAD in this repo. Mind if I rebase for you? While I'm doing it I can send in the few cosmetic changes I requested.

@garye
Copy link
Contributor Author

garye commented Oct 25, 2017

Sounds great @dhermes, please rebase when you can. Thanks for the help.

dhermes added a commit to garye/google-cloud-python that referenced this pull request Oct 25, 2017
@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this State. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@googlebot googlebot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Oct 25, 2017
@dhermes
Copy link
Contributor

dhermes commented Oct 25, 2017

@garye After rebasing so that the bigtable tests actually run, there are now 9 failing unit tests. Did you run the tests locally before pushing the change?

Also, should I change the way we construct a PartialCellData so that when chunk.HasField('qualifier') is False we use None instead of chunk.qualifier.value?

(FWIW, I'd imagine the reason a google.protobuf.BytesValue message field is used instead of a scalar bytes field is the reason you mentioned. The backend wants a way to distinguish between unset and empty string, which is impossible with a scalar field.)

@garye
Copy link
Contributor Author

garye commented Oct 25, 2017

Yeah I mentioned in my initial comment that there were failing tests but wanted to get this up for discussion.

Yes, please make that change! You're right about the reason for the BytesValue wrapper, AFAIK.

@garye
Copy link
Contributor Author

garye commented Nov 2, 2017

I think this latest change fixes the test failures and makes the new test pass. @dhermes, PTAL

@dhermes
Copy link
Contributor

dhermes commented Nov 2, 2017

Thanks @garye. I am rebasing and sorting out the docs issue right now.


Funny git trivia, you've created I beast I've never seen before with your last commit. It silently "broke" git rebase (i.e. the rebase dropped it from history). It breaks cherry-pick:

$ git cherry-pick d736e7227c58b3769e5ebedda4155700a1354942
error: Commit d736e7227c58b3769e5ebedda4155700a1354942 is a merge but no -m option was given.
fatal: cherry-pick failed

and if I do git show d736e7227c58b3769e5ebedda4155700a1354942, each of the diff lines has two plus/minus signs versus the typical one:

diff --cc bigtable/google/cloud/bigtable/row_data.py
index ae933a6,8fecb59..13e32c1
--- a/bigtable/google/cloud/bigtable/row_data.py
+++ b/bigtable/google/cloud/bigtable/row_data.py
@@@ -283,10 -283,10 +283,14 @@@ class PartialRowsData(object)
                  row = self._row = PartialRowData(chunk.row_key)
  
              if cell is None:
++                qual = None
++                if chunk.HasField('qualifier'):
++                    qual = chunk.qualifier.value
++
                  cell = self._cell = PartialCellData(
                      chunk.row_key,
                      chunk.family_name.value,
--                    chunk.qualifier.value,
++                    qual,
                      chunk.timestamp_micros,
                      chunk.labels,
                      chunk.value)

@dhermes
Copy link
Contributor

dhermes commented Nov 2, 2017

I ran nox -s docs locally (after rebasing) and it passed, so fingers crossed.

@garye
Copy link
Contributor Author

garye commented Nov 2, 2017

you've created I beast I've never seen before with your last commit.

One for the record books! Sorry about that :/

@dhermes
Copy link
Contributor

dhermes commented Nov 2, 2017

Sorry about that

Luckily it was a 4 line diff 😀

Copy link
Contributor

@dhermes dhermes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dhermes dhermes merged commit 49e744c into googleapis:master Nov 2, 2017
@garye
Copy link
Contributor Author

garye commented Nov 2, 2017

Thanks for all the help! Any guidance I can give the customer on when this will be released?

@dhermes
Copy link
Contributor

dhermes commented Nov 2, 2017

@garye Would you like a release for this? 0.28.1 OK with you (vs. 0.29.0)?

@dhermes
Copy link
Contributor

dhermes commented Nov 2, 2017

Jinx! I can do one right now, it should be very quick since we just did one 2 days ago so the release notes will be teeny.

@garye
Copy link
Contributor Author

garye commented Nov 2, 2017

Perfect! 0.28.1 would be a big help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: no This human has *not* signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants