Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: optimize row merging #628

Merged
merged 25 commits into from
Aug 17, 2022
Merged

Conversation

igorbernstein2
Copy link
Contributor

@igorbernstein2 igorbernstein2 commented Aug 8, 2022

This PR rewrites the row merging logic to be more correct and improve performance:

  • extract row merging logic into its own class to simplify complexity of ReadRows handling
  • Use OrderedDict instead of dict() for {family: { qualifier: [] }} data, this should maintain serverside ordering (family in creation order and qualifier in lexiographical).
  • define an explicit state machine with states implemented as methods
  • add various optimizations like:
    • slots on hot objects to avoid dict lookups
    • avoiding dict lookups for contiguous family and qualifier keys

Overall this improves performance by 20% and in my opinion is a lot more readable

- extract read-rows-acceptance-test.json based tests into own file
- update the json to match the latest available in https://github.com/googleapis/conformance-tests/tree/main/bigtable/v2
- use parameterized pytest test to run all of the scenarios (instead of creating a function for each json blob)
- use json protobufs to parse the file

I left a TODO to allow easy updates of the file, unfortunately its not straight forward as the canonical protos get renamed for python gapic
Next PR will extract row merging functionality from row_data to make it easier to maintain
@igorbernstein2 igorbernstein2 requested review from a team as code owners August 8, 2022 20:33
@snippet-bot
Copy link

snippet-bot bot commented Aug 8, 2022

No region tags are edited in this PR.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigtable Issues related to the googleapis/python-bigtable API. labels Aug 8, 2022
google/cloud/bigtable/row_merger.py Outdated Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Outdated Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Outdated Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Outdated Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Show resolved Hide resolved
google/cloud/bigtable/row_data.py Show resolved Hide resolved
@kolea2 kolea2 changed the title chore: optimize row merging Aug 10, 2022
@Mariatta
Copy link
Contributor

Can you add the documentation of row_merger to the Sphinx documentation?
First create docs/row-merger.rst (see the example content from row-data.rst)

Then add row-merger to the usage.rst line 18.

We should also update the usages of row_data.Cell / row_data.PartialRowData / row_data.InvalidChunk in the documentation to use the row_merger instead.

igorbernstein2 added a commit to igorbernstein2/python-bigtable that referenced this pull request Aug 11, 2022
This is in preparation for extracting row merging into a separate class. See googleapis#628
Copy link
Member

@dbolduc dbolduc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The row merging logic LGTM. I left some nits that you can fix or not.

google/cloud/bigtable/row_merger.py Outdated Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Outdated Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Outdated Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Show resolved Hide resolved
google/cloud/bigtable/row_merger.py Show resolved Hide resolved
Mariatta pushed a commit that referenced this pull request Aug 17, 2022
* chore: move row value classes out of row_data

This is in preparation for extracting row merging into a separate class. See #628

Co-authored-by: Anthonios Partheniou <partheniou@google.com>
# Conflicts:
#	google/cloud/bigtable/row_data.py
# Conflicts:
#	google/cloud/bigtable/row_merger.py
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Aug 17, 2022
@Mariatta Mariatta added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 17, 2022
@igorbernstein2 igorbernstein2 added kokoro:force-run Add this label to force Kokoro to re-run the tests. and removed kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Aug 17, 2022
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 17, 2022
@igorbernstein2 igorbernstein2 merged commit c71ec70 into googleapis:main Aug 17, 2022
@igorbernstein2 igorbernstein2 deleted the row_merger branch August 17, 2022 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. size: l Pull request size is large.
5 participants