-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud Storage - Implement method to check existence of multiple objects in a single operation #2337
Comments
@dwsupplee, I am working on this. I would like to coordinate the design of the solution.
I myself would choose No. 1 As for the method of the Bucket class that @bduclaux requested, there is a question about handling the result.
I myself would choose No. 1 |
Hi @andrewinc , thanks ! |
Yes @bduclaux You're right. This operation
As for the large list, you are also right that the list can be shortened by specifying prefix if desired. Of course, you need to leave this feature provided for in I would like to know what @dwsupplee will write about this. Maybe it will offer a different solution. |
@andrewinc thanks so much for taking the time to put some thoughts together on this, and thank you @bduclaux for the feature request. We'd definitely love to add support for something like this. We've been laying the groundwork for exposing asynchronous network requests for some time now. This should allow us to expose something which looks like the following: use GuzzleHttp\Promise;
use Google\Cloud\Storage\StorageClient;
$bucket = (new StorageClient())->bucket('my-bucket');
$promises = [];
$objectNames = ['a.txt', 'b.txt', 'c.txt'];
foreach ($objectNames as $objectName) {
$bucket->object($objectName)
->existsAsync()
->then(function ($exists) {
echo "$objectName: $exists" . PHP_EOL;
});
}
Promise\unwrap($promises); We've done this as a "one-off" over on StorageObject::downloadAsStreamAsync, with the plan being to expose the rest of the async method counterparts across the Storage library as part of our 2.0 version bump (we don't have a clear ETA for this at the moment). Another option would be to expose the batch API through our storage client, this would allow interweaving up to 100 requests together into a single API request. It looks like some work in progress to define a plan for how we can expose this across languages. I'll check in and see where this progress is at, but will note it could require breaking changes to the library as well. I prefer these approaches over the list objects implementation because I'm apprehensive of edge case scenarios like the following: I have 100,000 objects in my bucket. I want to check objects "a.txt" and "z.txt" exist. "a.txt" happens to be object 1/100,000 returned, while "z.txt" is object 100,000. The max results returned from a single RPC to list objects is 1,000 - meaning I'd have to page through 100 times to get to "z.txt". The end cost is ~100 RPCs to check for two objects. |
Hello
We are using the PHP cloud storage library, and we are facing a performance issue to check existence of multiple objects in a storage bucket.
Currently, the only way to implement such check is by using a loop such as:
This triggers a REST api call for each of the objects, which is slow. We usually have around 10 object names per loop, so this takes around 0.4s.
As we do this a lot of times, we have a performance issue.
It would be great to have a method in the Bucket class to check multiple object names at once, with a single request to the cloud storage back-end APIs (not sure such method exists in the back-end API).
Thanks !
The text was updated successfully, but these errors were encountered: