Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add icu4x_shared with some helpers #5101

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Conversation

Comment on lines +102 to +103
where
S: serde::ser::Serializer,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This impl is not bound on Transparent because (1) it isn't needed for safety reasons, (2) Transparent is a private trait, and (3) the generated functions use cast_ref_unchecked which in effect enforces transparency since those functions require transparency

@sffc sffc marked this pull request as ready for review June 22, 2024 15:39
@sffc sffc requested a review from a team as a code owner June 22, 2024 15:39
@Manishearth
Copy link
Member

I'm not really in favor of introducing a Transparent trait, but I'll have a closer look later. I don't think such a trait is necessary to achieve what we need here.

@sffc
Copy link
Member Author

sffc commented Jun 24, 2024

I'm not really in favor of introducing a Transparent trait, but I'll have a closer look later. I don't think such a trait is necessary to achieve what we need here.

I introduced it as a crate-private trait in order to make the macros safe. Happy for suggestions on how to do this better.

@Manishearth
Copy link
Member

Ah, I see what you've done here.

I think what I was envisioning was a way to not have any crate-specific unsafe code at use sites. A macro that can be invoked as

impl_helpers!(
 /// some docs
 #[derive(...)]
 #[repr(transparent)]
 pub struct Foo(Bar);
)

Thinking about it more it probably could still a trait so that it can be split into different cfg()able macros. I'm fine with that. But ideally the actual transparent impl can be done without writing unsafe.

Also FWIW while I don't consider it a blocking opinion, if we're putting unsafe stuff in icu4x_shared my opinion on #4467 does shift a bit: I kind of think unsafe stuff is ideally placed in a separate crate so projects can properly evaluate it. It is nice that the bulk of our crates currently have zero unsafe code, I'd love to retain that property. "No unsafe code" is a much easier slam-dunk evaluation than "unsafe code but it's copypaste unsafe code".

@Manishearth
Copy link
Member

Actually, no, thinking about it more, I do kind of consider this somewhat blocking. This goes back to the point I made about being very careful what we sacrifice in favor of the goals of serving cases like that of rust-url: I don't think "a lot more ICU4X crates contain unsafe code" is an acceptable cost. I'd rather this just lived in icu_provider, even though that's counter to #4467. Or a separate crate, but icu_provider seems fine. Probably doc(hidden) with the guarantee that we won't break its usage in ICU4X across minor versions (the policy agreed upon in #4343 (comment)).

@sffc
Copy link
Member Author

sffc commented Jun 24, 2024

I would use the ref-cast crate, but it doesn't support everything we need (box casting), and dtolnay says it's unlikely to support that.

Adding yet another crate to do this, in addition to the docs macros noted in #4467, has downsides:

  • It's a crate that doesn't actually do anything and is 100% ICU4X internal
  • ICU4X-internal APIs in core crates are costly due to the additional requirement that they can't change across minor versions, meaning they are effectively public stable APIs and we have less flexibility to change them
  • It increases our crate count even more, which we know is a problem, highlighted by rust-url

Some options:

  1. Copy these repr(transparent) helper macros only to the crates that need them. Put the other macros proposed in Decide and implement a solution for internal-facing APIs #4467 in a different file that doesn't contain unsafe code.
  2. Don't use macros and just copy and paste these unsafe impls everywhere. This is what we currently do. Pro: the unsafe code is close to where it is used. Con: the unsafe code is spread out which means it's more likely to have bugs; centralized unsafe code, even if it's in a file that we copy around to many places, is better from a safety and maintenance POV.
  3. Use a crate and eat the cost. We already have a bazillion crates so what's one more?
@Manishearth
Copy link
Member

ICU4X-internal APIs in core crates are costly due to the additional requirement that they can't change across minor versions, meaning they are effectively public stable APIs and we have less flexibility to change them

I think this specific API is fine to put in a core crate, for two reasons:

  • It's primarily a macro. It is possible (though trickier) to write this in a way that is entirely a macro with no trait dependencies. This gives us a lot of flexibility when it comes to breakages.
  • I don't think we'll need to break how this works
  • As mentioned before, we do not need to avoid changing this in breaking ways, only in ways that break ICU4X. That's far easier to manage.

I was extremely in favor of copy-pasting for the docs macros, because there is almost zero cost to that. Making every ICU4X crate (or even "every ICU4X crate using these macros") contain unsafe code is a major cost and affects ecosystem perception as well as the practicalities of unsafe review.

Put differently: doc macros copy pasted across crates are effectively one bit of code that is seen in multiple places. The only impact "duplicate code" has is on maintainers, and we'll have tooling making it transparent. unsafe macros copy pasted across crates are not effectively the same bit of code seen in multiple places, each copy adds a cost, and it has impacts outside of maintainers, including in ways that affect the perception of our project.

Personally, I am in favor of option 4: these macros (and only these macros) live in icu_provider, doc(hidden) with a comment about stability across minor versions.

Preferences are 4 > 3 >> 1 > 2, I consider "every crate that needs this needs unsafe code" to be an unacceptable cost for what I have previously highlighted as a very nebulous benefit.

@Manishearth
Copy link
Member

Actually, hold on, I'm also comfortable with the unsafe macros living inside the varule crate. That would work just fine, and they could be public if we want, they're clearly useful.

Happy to write them in a way that doesn't involve Transparent if desired.

@sffc
Copy link
Member Author

sffc commented Jun 24, 2024

This functionality is sufficiently similar to what's in zerovec that I'd be okay adding them there

In case it got lost, I want to point out that these are not only VarULE helper macros. They also encapsulate functionality from ref-cast, which is code we are already currently required to have sprinkled in various places. This particular crate already has unsafe code; the PR at hand does not change that. We're not regressing. We're not "sacrificing" anything in the more narrow context of this PR; that's only a hypothetical situation if a future migration changes a safe crate to an unsafe create. If we can move this crate's unsafe code elsewhere, that's a win.

@Manishearth
Copy link
Member

We're not regressing. We're not "sacrificing" anything in the more narrow context of this PR

Okay, but the "narrow context of the PR" is not the context in which we were planning this change: we were planning to use these helpers to clean up a large chunk of our derive(ULE) and VarULE uses. This is not some hypothetical future migration, this is a migration we tentatively agreed upon less than a week ago. If I have these objections, it would be quite counterproductive of me to wait for one of the bigger migrations to state these objections.

I also do think that from a reviewability perspective, this PR in its narrow context still makes things worse: it moves a single callsite over to a more indirect macro situation. The repr(transparent) unsafe we have currently that replaces ref-cast is duplicated boilerplate, but it's super super easy to review. Macros will obfuscate it, if we were just replacing manual-ref-cast (even with multiple callsites) with this macro I'd still be somewhat skeptical.

@sffc
Copy link
Member Author

sffc commented Jun 24, 2024

I haven't done a full audit but I would make the claim that most variable length repr(transparent) things need both ref-cast and VarULE impls, and since all ref-cast impls currently involve unsafe, there might not be many cases where we would be turning a safe crate to an unsafe crate (which I agree is a big cost).

This PR definitely introduces indirection; my position is that it is an overall improvement, but it's a reasonable debate and one I was hoping to have. I think it is clearer and safer for the safety invariants to live as trait bounds in macros that all live in one file whose only job is to make it exceedingly obvious that the safety invariants are upheld rather than having random unsafe blocks sprinkled around the crate.

@sffc
Copy link
Member Author

sffc commented Jun 24, 2024

Can you explain or demonstrate or cite how to do this safely without the trait bound? That might help move the conversation forward. Ultimately I'm mostly onboard with putting this in the zerovec crate.

@Manishearth
Copy link
Member

variable length

operative term here, I think we'll be applying these to most fixed length ones too

and I'm not opposed to the VarULE macros exposing these methods too, it's not a hard toggle to add

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't Transparent live in zerovec? There doesn't seem to be a use for it outside of an easier VarULE implementation.

@sffc
Copy link
Member Author

sffc commented Jun 25, 2024

Can the invariant be enforced without the trait by generating a static assertion that the sizes are equal?

@Manishearth
Copy link
Member

@sffc I don't think we need to test the invariant at all, as long as the macro controls generating the struct.

@sffc
Copy link
Member Author

sffc commented Jun 25, 2024

I would be quite sad but not block if the solution is wrapping the struct definition in a macro. I prefer standalone macros with traits or assertions

@Manishearth
Copy link
Member

Going to move discussion to #5127

@sffc sffc marked this pull request as draft June 27, 2024 21:39
@sffc
Copy link
Member Author

sffc commented Jun 27, 2024

Converting to draft pending discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants