-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain base languages for compiled data #5195
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks like it's roughly +50% for the lookup trie, in some cases it's much more, in few cases less. Highlighted some cases where it by far exceeds the previous size, or the size of the actual data.
@@ -4,7 +4,7 @@ | |||
/// `icu`'s `_unstable` constructors. | |||
/// | |||
/// Using this implementation will embed the following data in the binary's data segment: | |||
/// * 41B for the lookup data structure (2 data identifiers) | |||
/// * 606B for the lookup data structure (151 data identifiers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not great
@@ -4,7 +4,7 @@ | |||
/// `icu`'s `_unstable` constructors. | |||
/// | |||
/// Using this implementation will embed the following data in the binary's data segment: | |||
/// * 650B for the lookup data structure (117 data identifiers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not great
@@ -4,7 +4,7 @@ | |||
/// `icu`'s `_unstable` constructors. | |||
/// | |||
/// Using this implementation will embed the following data in the binary's data segment: | |||
/// * 130B for the lookup data structure (25 data identifiers) | |||
/// * 631B for the lookup data structure (156 data identifiers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not great
Now that |
You mean having separate lookup tries for |
Yeah, but I also was thinking that the Rust compiler cannot deduplicate between About the differing behaviour, I think it's fine. The trait just guarantees that if |
We had a hypothesis at some point that retain-base-languages might be more efficient since it saves a hop during locale fallbacking. Do we have figures for how much it saves on average? Maybe based on one of our examples? |
I think @jedel1043's idea is worth investigating. Maybe a hybrid approach where there are 2 tries, one that has maximal deduplication lookup and the other that holds the diff between retain-base-languages and maximal. |
Or actually, all of the DryDataProvider impls could just use the same trie, which always contains only the base langauges, and all of the impls generated from the same datagen invocation can use the exact same trie. |
Some discussion:
|
This PR doesn't add DryDataProvider impls yet, right? It's just changing the deduplication mode? |
#58