Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve row merging #619

Merged
merged 4 commits into from
Aug 4, 2022

Conversation

igorbernstein2
Copy link
Contributor

The underlying GAPIC client uses protoplus for all requests and responses. However the underlying protos for ReadRowsResponse are never exposed to end users directly: the underlying chunks get merged into logic rows. The readability benefits provided by protoplus for ReadRows do not justify the costs. This change unwraps the protoplus messages and uses the raw protobuff message as input for row merging. This improves row merging performance by 10x. For 10k rows, each with 100 cells where each cell is 100 bytes and in groups of 100 rows per ReadRowsResponse, cProfile showed a 10x improvement:

old: 124266037 function calls in 68.208 seconds
new: 13042837 function calls in 7.787 seconds

There are still a few more low hanging fruits to optimize performance and those will come in follow up PRs

The underlying GAPIC client uses protoplus for all requests and responses. However the underlying protos for ReadRowsResponse are never exposed to end users directly: the underlying chunks get merged into logic rows. The readability benefits provided by protoplus for ReadRows do not justify the costs. This change unwraps the protoplus messages and uses the raw protobuff message as input for row merging. This improves row merging performance by 10x. For 10k rows, each with 100 cells where each cell is 100 bytes and in groups of 100 rows per ReadRowsResponse, cProfile showed a 10x improvement:

old:          124266037 function calls in 68.208 seconds
new:          13042837 function calls in 7.787 seconds
@igorbernstein2 igorbernstein2 requested review from a team as code owners August 4, 2022 17:29
@product-auto-label product-auto-label bot added size: s Pull request size is small. api: bigtable Issues related to the googleapis/python-bigtable API. labels Aug 4, 2022
@kolea2 kolea2 changed the title fix: improve row merging perf by 10x Aug 4, 2022
The previous approach of duck typing the protobuf messages to plain python objects no longer works as we need to shuck protoplus now
Copy link
Contributor

@Mariatta Mariatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

@igorbernstein2 igorbernstein2 added the automerge Merge the pull request once unit tests and other checks pass. label Aug 4, 2022
@gcf-merge-on-green gcf-merge-on-green bot merged commit b4853e5 into googleapis:main Aug 4, 2022
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Aug 4, 2022
@igorbernstein2 igorbernstein2 deleted the protominus branch August 4, 2022 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/python-bigtable API. size: s Pull request size is small.
2 participants