-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add is_normalized_up_to
to Normalizer
#4334
Conversation
cdd4ffc
to
553413e
Compare
Added more UTF8 tests. |
16c2a4b
to
8240b4c
Compare
Added UTF8 variant to FFI as |
1681ae9
to
289a27a
Compare
Added UTF16 variant to FFI coverage allowlist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great for the implementation of the &str
case. Thanks! The docs need improvement.
The exact semantics of the &[u8]
and &[u16]
case still need thinking.
components/normalizer/src/lib.rs
Outdated
@@ -1457,6 +1457,13 @@ macro_rules! normalizer_methods { | |||
ret | |||
} | |||
|
|||
/// Return the index a string slice is normalized up to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For docs, please, frame this explicitly as a building block of copy-on-write normalization. That the return value is the largest index at which the input can be split to head and tail such that the head can be copied to output and only tail needs to go through the normalization process.
It would be the clearest to provide the invariant-stating code from the issue (with stuff filled in so that it compiles as a doc test) as an explanation of what this is for.
assert!(nfkd.is_normalized_utf8_up_to(fraction_utf8) == 1); | ||
assert!(nfc.is_normalized_utf8_up_to(fraction_utf8) == 4); | ||
assert!(nfkc.is_normalized_utf8_up_to(fraction_utf8) == 1); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test a lot of input, short of fuzzing, it would be useful to copypaste test_conformance()
into a new test where instead of the test's innermost testable being normalize_to
, you'd do normalize_up_to
, spit_at
, copy the head, and use normalize_to
for the tail.
Conclusion:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CanadaHonk Are you planning to follow up on this PR?
This should also land behind an experimental feature. |
Why? I thought we resolved in the November 30 discussion minuted above that this can land unconditionally. |
That's not clear to me from the minutes, and I don't remember details of this discussion. That said, "stable" in 1.5 doesn't mean much anyway, as we can make breaking changes in 2.0, so it's not that important. |
tools/ffi_coverage/src/allowlist.rs
Outdated
@@ -264,9 +264,11 @@ lazy_static::lazy_static! { | |||
"icu::normalizer::ComposingNormalizer::normalize_utf16", | |||
"icu::normalizer::ComposingNormalizer::normalize_utf16_to", | |||
"icu::normalizer::ComposingNormalizer::is_normalized_utf16", | |||
"icu::normalizer::ComposingNormalizer::is_normalized_utf16_up_to", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please expose these, using &DiplomatStr16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest changeset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, don't forget to run cargo make diplomat-gen
Closes unicode-org#4256. Added UTF8 variant to FFI as `is_normalized_up_to`. No UTF16 tests or fuzzing yet.
f06461e
to
1665408
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving the remaining tests to follow-ups.
Closes #4256. Added UTF8 variant to FFI as
is_normalized_up_to
. No UTF16 tests or fuzzing yet.