-
Notifications
You must be signed in to change notification settings - Fork 28.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: avoid allocation in the Size method for BASE64URL and BASE64 #53550
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
For people who care about history, Size has been expensive since the very beginning (we trace it back to the initial commit by @isaacs in 2013): case BASE64: {
String::AsciiValue value(str);
data_size = base64_decoded_size(*value, value.length());
break;
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
This comment was marked as outdated.
This comment was marked as outdated.
for the record, Size() wasn't just doing an unnecessary allocation, it was also returning the wrong size, causing us to allocate a much larger buffer than necessary. str->Length() % 4 <= 1 ? str->Length() / 4 * 3
: str->Length() / 4 * 3 + (str->Length() % 4) - 1) which is only an upper bound, as it doesn't look at the actual data, and so it assumes the worst case (all characters are data characters, i.e. no padding or whitespace). the math looks correct to me. bear in mind that while Size() is allowed to return an upper bound, it is expected to return an exact prediction most of the time. if this is not the case, we do another allocation + memcpy to a new backing store with the actual size. for base64url the prediction is usually correct as it rarely has padding, but for base64 it will miss 2/3 of the time (or always, if the input contains whitespace). |
As far as I can tell, Node never returned the exact size. Doing so requires scanning the entire input, checking for characters to discard. |
wasn't |
even if it's not worth it to provide an exact prediction (like you said, it would require scanning the full input) we can still make the prediction exact for "pure" base64 (i.e. containing padding but no whitespace) which is pretty common. |
ah, i see, they had the same idea...
80-90% in my i7 linux laptop:
thanks, that was exactly what I had in mind. if we can efficiently handle a 'normal' input (as in, produced by toString('base64') or the equivalent in most languages) I'm happy :) I don't think it needs to be any fancier, but i don't see a need to specialize it to extern 8-bit strings. as said Write() takes start so we can check the last 2 chars without copying the whole string, i.e. if (str->Length() >= 2) {
char16_t trailing [2];
int written = str->Write(isolate, trailing, str->Length() - 2, 2);
// ...
} are you concerned it may still be too heavy of an operation to be worth it? |
@mildsunrise I kept it simple deliberately. The use case we care about in the benchmark I provided is the external one-byte case. It is highly likely that other cases could be optimized. It would be good to have corresponding benchmarks. I submit to you that it can be done in distinct PRs. |
looks good to me, I'll open another PR after this one lands and we can continue the discussion there :) |
case BASE64: { | ||
String::Value value(isolate, str); | ||
return Just(simdutf::base64_length_from_binary(value.length())); | ||
size_t data_size = str->Length() % 4 <= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the duplication here, any chance of moving this into a separate helper function?
❤️
This is, of course, a very valid remark. It is clear that @mildsunrise is planning to do another pass over this code as soon as it lands. My suggestion is to leave it like it is now and just wait for @mildsunrise to improve it. |
I'm fine with both options |
also, we can discuss that in the next PR but I wanted to point it out now just in case: we are currently only creating external strings if their length is above ~1MB. so, for the test cases 512k and below, the string is actually internal and thus we do 2 extra copies + allocs. |
Not fully on-topic, but also not entirely off-topic: For those not active on Twitter but watching this issue, Leszek Swirski who works on V8 made a thread, mentioning this issue and what can be improved from V8 side too: https://x.com/leszekswirski/status/1805170432753934644. |
I bet that @mildsunrise can help boost the speed further without any update to v8. :-) But... yeah... help from v8 is probably needed to get the best possible performance. |
yes, I don't think we need any update to v8 to get on par with Bun in this specific benchmark, because the strings are guaranteed to be flat. indeed, I was able to get these results by lowering but having V8 expose us an iterator over non-flat strings (like the one they already use internally) would allow us to get the best performance in all cases (except when there is whitespace present). edit: not that I'm actually planning on touching that constant... it was chosen 11 years ago through a microbenchmark very similar to this one, and I don't feel knowledgeable enough to recommend tweaking it just based off another microbenchmark. I can only confirm that GC activity shoots up at 512kB otherwise, but this might be understood as a microbenchmark deficiency |
Indeed. I've landed a non-copying v8::String::ValueView API in V8 now, so you should be able to use this once you roll V8 past 12.8.218. At that point this simplifies to something like: String::ValueView view(isolate, str);
data_size = view.length() % 4 <= 1
? view.length() / 4 * 3
: view.length() / 4 * 3 + (view.length() % 4) - 1;
// Check if the string ends with one or two padding characters and adjust the
// size accordingly. Note that the input can contain non-base64 characters, so,
// at best, we can provide an upper bound. A correct size would requires
// scanning the entire input. We try to keep the function as fast as possible,
// so we only check the last two characters when the string is one-byte.
if (view.is_one_byte()) {
if (view.length() > 1) {
if (view.data8()[view.length() - 1] == '=') {
data_size--;
if (view.data8()[view.length() - 2] == '=') {
data_size--;
}
}
}
} (I don't include a padding character check for a two-byte string since then it'll anyway likely be a wrong length estimate, due almost guaranteed non-base64 characters in the string) |
For large base64 strings, this PR multiplies the performance of
Buffer.from(..., "base64");
by making the Size function simpler and non-allocating.This will reduce the gap with Bun from being over 3x slower to being about only 40% slower for large inputs. For modest inputs, Node.js is still about 2x slower than Bun. Note that both Bun and Node.js use the same underlying library (simdutf) for base64 decoding so any difference is entirely due to the runtime (not to the base64 decoding per se).
Benchmark (from bun):
Node.js 22:
Node.js with this PR:
Bun canary (upcoming release)
To get the Bun results, you need the canary which you may get with
bun upgrade --canary
.