Bigtable ReadRows doesn't handle empty string column qualifier by garye · Pull Request #4252 · googleapis/google-cloud-python

garye · 2017-10-25T14:12:03Z

This problem was reported by a customer. As background, the empty string is a valid column qualifier. A missing qualifier, as opposed to the empty string, means the previously seen qualifier should be applied instead.

I added a new test case that has a cell with a column qualifier of empty string arriving after a cell with a non-empty qualifier. The code fails to distinguish between a missing qualifier and an empty qualifier and overwrites the empty string with the first qualifier.

With no code changes the new test case fails. I attempted to fix the bug in row_data.py but this causes other failures. I was hoping someone could pick this up and investigate.

bigtable/google/cloud/bigtable/row_data.py

            if not cell.family_name:
                cell.family_name = previous.family_name
-            if not cell.qualifier:
+            if cell.qualifier is None: # Note: qualifier can be empty string


dhermes

This mostly looks fine, I'm just trying to figure out how qualifier gets set / when it could ever be None

bigtable/google/cloud/bigtable/row_data.py


    These are expected to be updated directly from a
-    :class:`._generated.bigtable_service_messages_pb2.ReadRowsResponse`
+   :class:`._generated.bigtable_service_messages_pb2.ReadRowsResponse`


bigtable/google/cloud/bigtable/row_data.py

            if not cell.family_name:
                cell.family_name = previous.family_name
-            if not cell.qualifier:
+            if cell.qualifier is None: # Note: qualifier can be empty string


bigtable/tests/unit/test_row_data.py

    def _match_results(self, testcase_name, expected_result=_marker):
        chunks, results = self._load_json_test(testcase_name)
        response = _ReadRowsResponseV2(chunks)
+


bigtable/tests/unit/test_row_data.py

        self.assertEqual(cell.row_key, '')
        self.assertEqual(cell.family_name, u'')
-        self.assertEqual(cell.qualifier, b'')
+        self.assertEqual(cell.qualifier, None)


bigtable/tests/unit/test_row_data.py

    row_key = ''
    family_name = u''
-    qualifier = b''
+    qualifier = None


dhermes · 2017-10-25T15:13:32Z

@garye I think I get it, since ReadRowsResponse.CellChunk.qualifier is a google.protobuf.BytesValue, it's a message field rather than a scalar field, so None is an acceptable default. (I'm still trying to figure out where a None value gets passed into the PartialCellData constructor.)

dhermes · 2017-10-25T15:15:46Z

@garye I just noticed CircleCI is trying to run tests for all the packages for you. This is probably because your branch isn't pointing at the latest HEAD in this repo. Mind if I rebase for you? While I'm doing it I can send in the few cosmetic changes I requested.

garye · 2017-10-25T15:50:43Z

Sounds great @dhermes, please rebase when you can. Thanks for the help.

googlebot · 2017-10-25T15:56:07Z

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this State. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

dhermes · 2017-10-25T16:19:08Z

@garye After rebasing so that the bigtable tests actually run, there are now 9 failing unit tests. Did you run the tests locally before pushing the change?

Also, should I change the way we construct a PartialCellData so that when chunk.HasField('qualifier') is False we use None instead of chunk.qualifier.value?

(FWIW, I'd imagine the reason a google.protobuf.BytesValue message field is used instead of a scalar bytes field is the reason you mentioned. The backend wants a way to distinguish between unset and empty string, which is impossible with a scalar field.)

garye · 2017-10-25T16:30:35Z

Yeah I mentioned in my initial comment that there were failing tests but wanted to get this up for discussion.

Yes, please make that change! You're right about the reason for the BytesValue wrapper, AFAIK.

garye · 2017-11-02T17:21:58Z

I think this latest change fixes the test failures and makes the new test pass. @dhermes, PTAL

Attempt to fix code to handle both empty string and missing string but tests DO NOT pass.

dhermes · 2017-11-02T18:32:44Z

Thanks @garye. I am rebasing and sorting out the docs issue right now.

Funny git trivia, you've created I beast I've never seen before with your last commit. It silently "broke" git rebase (i.e. the rebase dropped it from history). It breaks cherry-pick:

$ git cherry-pick d736e7227c58b3769e5ebedda4155700a1354942
error: Commit d736e7227c58b3769e5ebedda4155700a1354942 is a merge but no -m option was given.
fatal: cherry-pick failed

and if I do git show d736e7227c58b3769e5ebedda4155700a1354942, each of the diff lines has two plus/minus signs versus the typical one:

diff --cc bigtable/google/cloud/bigtable/row_data.py
index ae933a6,8fecb59..13e32c1
--- a/bigtable/google/cloud/bigtable/row_data.py
+++ b/bigtable/google/cloud/bigtable/row_data.py
@@@ -283,10 -283,10 +283,14 @@@ class PartialRowsData(object)
                  row = self._row = PartialRowData(chunk.row_key)
  
              if cell is None:
++                qual = None
++                if chunk.HasField('qualifier'):
++                    qual = chunk.qualifier.value
++
                  cell = self._cell = PartialCellData(
                      chunk.row_key,
                      chunk.family_name.value,
--                    chunk.qualifier.value,
++                    qual,
                      chunk.timestamp_micros,
                      chunk.labels,
                      chunk.value)

dhermes · 2017-11-02T18:34:38Z

I ran nox -s docs locally (after rebasing) and it passed, so fingers crossed.

garye · 2017-11-02T18:48:25Z

you've created I beast I've never seen before with your last commit.

One for the record books! Sorry about that :/

dhermes · 2017-11-02T19:41:32Z

Sorry about that

Luckily it was a 4 line diff 😀

dhermes

LGTM

garye · 2017-11-02T19:44:04Z

Thanks for all the help! Any guidance I can give the customer on when this will be released?

dhermes · 2017-11-02T19:44:27Z

@garye Would you like a release for this? 0.28.1 OK with you (vs. 0.29.0)?

dhermes · 2017-11-02T19:47:10Z

Jinx! I can do one right now, it should be very quick since we just did one 2 days ago so the release notes will be teeny.

garye · 2017-11-02T19:53:41Z

Perfect! 0.28.1 would be a big help.

…er (#4252)

garye requested a review from lukesneeringer as a code owner October 25, 2017 14:12

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Oct 25, 2017

garye commented Oct 25, 2017

View reviewed changes

dhermes reviewed Oct 25, 2017

View reviewed changes

dhermes force-pushed the master branch from 20c8c2e to 2eedb5f Compare October 25, 2017 15:56

dhermes added a commit to garye/google-cloud-python that referenced this pull request Oct 25, 2017

Small tweaks during review of googleapis#4252.

2eedb5f

googlebot added cla: no This human has *not* signed the Contributor License Agreement. and removed cla: yes This human has signed the Contributor License Agreement. labels Oct 25, 2017

garye and others added 4 commits November 2, 2017 11:25

Add test case for empty string qualifier.

7226577

Attempt to fix code to handle both empty string and missing string but tests DO NOT pass.

Small tweaks during review of googleapis#4252.

b1b8297

Lint fix (moved a comment).

670fded

Properly set qualifier when encountering new cell.

ff576b6

dhermes force-pushed the master branch from d736e72 to ff576b6 Compare November 2, 2017 18:34

dhermes approved these changes Nov 2, 2017

View reviewed changes

dhermes merged commit 49e744c into googleapis:master Nov 2, 2017

dhermes mentioned this pull request Jan 2, 2018

BigTable: Adding a row generator on a table. #4679

Merged

parthea pushed a commit that referenced this pull request Nov 22, 2025

Bugfix: Allow Bigtable ReadRows to handle empty string column qualifi…

c7d3996

…er (#4252)

Conversation

garye commented Oct 25, 2017

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

dhermes left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

This comment was marked as spam.

Uh oh!

dhermes commented Oct 25, 2017

Uh oh!

dhermes commented Oct 25, 2017

Uh oh!

garye commented Oct 25, 2017

Uh oh!

googlebot commented Oct 25, 2017

Uh oh!

dhermes commented Oct 25, 2017

Uh oh!

garye commented Oct 25, 2017

Uh oh!

garye commented Nov 2, 2017

Uh oh!

dhermes commented Nov 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhermes commented Nov 2, 2017

Uh oh!

garye commented Nov 2, 2017

Uh oh!

dhermes commented Nov 2, 2017

Uh oh!

dhermes left a comment

Choose a reason for hiding this comment

Uh oh!

garye commented Nov 2, 2017

Uh oh!

dhermes commented Nov 2, 2017

Uh oh!

dhermes commented Nov 2, 2017

Uh oh!

garye commented Nov 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dhermes commented Nov 2, 2017 •

edited

Loading