Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow hosts to reuse an existing Realm Intrinsics record #1420

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bmeck
Copy link
Member

@bmeck bmeck commented Jan 18, 2019

This PR builds upon previous talks we have had about Realms and Frozen Realms in JS. After having several discussions in meetings about these topics, it seems like there is still division on the direction that could be taken regarding Realms. While working and attend both the Frozen Realms calls and Node.js Security WG issues, it seems like investigation into this area would be good for Node.js to get a better view of what it would be like to use Realms.

This allows a host to reuse an existing Realm's Intrinsics while exposing a different set of globals. This PR would enable hosts to create Realms with the same set of intrinsics such as Frozen Realms and Compartments that have been talked about in the past. This is intended only to be exposed at the host level and not be a required behavior for hosts not wishing to expose the behavior. Node.js has a specific interest in allowing this at the host level while investigating security policies of modules and the ability to run code in isolated spaces. This PR may be built upon after further investigation, but it intended to be minimal while that investigation continues.

Prior to this PR hosts are already able to supply their own global object and global this value; this PR expands the options of what a host may supply to include intrinsics from another existing realm. If there is need for having modified intrinsics such as having differing values for Function or Function.prototype.constructor, I feel that can be addressed in a separate PR.

We could refactor Realm creation a bit to make the whole process much more centralized and easier to read, but I left the PR as minimal as possible for now to allow easier reading and expression of the host determining this choice. If it is desirable to refactor the Realm creation text, we can do so hopefully separately from this PR, but it seems like it should be done sometime given some odd wording while reading things.

@bmeck bmeck added needs consensus This needs committee consensus before it can be eligible to be merged. proposal This is related to a specific proposal, and will be closed/merged when the proposal reaches stage 4. labels Jan 18, 2019
@ljharb ljharb added normative change Affects behavior required to correctly evaluate some ECMAScript source text and removed proposal This is related to a specific proposal, and will be closed/merged when the proposal reaches stage 4. labels Jan 18, 2019
@annevk
Copy link
Member

annevk commented Jan 21, 2019

What kind of security guarantees are you looking for? Putting things in the same agent has a number of vulnerabilities by definition given current hardware.

@bmeck
Copy link
Member Author

bmeck commented Jan 21, 2019

@annevk The read vulnerability that keeps showing up due to timing does continue to occur in process, but was also cross process for some time before patches went out. It is not just Node that has in-process concerns when you look at other environments like CloudFlare Workers which also are doing things in-process it seems. My main investigation is into limiting references to powerful primitives. I have several calls with various people advocating for multi-process architecture due to the read vulnerability after releasing one path forward for Node. Due to the nature of and size of the Node.js ecosystem the requirements such as C++ addons running in the same process, as JS, multi-process mitigations are not necessarily safer.

This leads to being concerns with another type of attack that recurs just like the in-process reads on current architectures with the confused deputy that applies even to multi-process architectures. The ability to limit the communication of powerful primitives and even to isolate them is beneficial even if it does not save us from arbitrary reads. However, if someone has knowledge of arbitrary execution any attempt to have communication with powerful primitives, isolated or not, would be vulnerable; I do not know of any currently though.

Mitigations of speculative timing attacks are not likely to roll out into hardware soon, and timers are somewhat difficult to prevent from being obtained; however, that does not mean it is not fruitful to look into the discussions of isolating primitives entirely. @erights and @dtribble have done some work towards this mode of thinking that does follow my own desires; however, they are seeking to put code within an in-process sandbox rather than the root of the process which Node.js would be restricted to.

My investigations is towards the benefits and faults of these approaches rather than using Spectre as just what its name implies. The hope is that this PR can help with that investigation, and that of other companies and projects that also are doing in-process multi-tenancy. Right now the vast majority of Node.js applications do have multiple (HTTP) users per process and that is unlikely to change to per process per HTTP authorization, would have some new and interesting concerns for process management, and would break some usage of C++ addons. I am hopeful to get people to allow these investigations to continue and this PR would help.

@littledan
Copy link
Member

How does this PR relate to implementations? Do the JavaScript implementations used in Node.js support this usage mode, or is this PR a way to make the feature request to them? Are they interested in implementing this feature if the PR lands?

@@ -6163,7 +6163,7 @@ <h1>CreateRealm ( )</h1>
<p>The abstract operation CreateRealm with no arguments performs the following steps:</p>
<emu-alg>
1. Let _realmRec_ be a new Realm Record.
1. Perform CreateIntrinsics(_realmRec_).
1. If the host requires use of an existing Realm Record's intrinsics to serve as _realmRec_'s intrinsics, let _realmRec_.[[Intrinsics]] be assigned in an implementation-defined manner to an existing Realm Record's intrinsics record. Otherwise, perform CreateIntrinsics(_realmRec_).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let foo be assigned should probably be set foo.

And actually, I don't think the imp-defness applies to the manner in which the set happens, but rather (presumably) to the host's option.

So I suggest something like:

1. If the host, for implementation-defined reasons, requires that _realmRec_ reuse the intrinsics of an existing Realm Record _rr_, then
    1. Set _realmRec_.[[Intrinsics]] to _rr_.[[Intrinsics]].
1. Else
    1. Perform CreateIntrinsics(_realmRec_).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this change is not necessary since we do not plan to reuse intrinsics between realm records. Mostly because this was meant to solve the identity discontinuity issue that can be solved via a membrane, or the upcoming parameterized evaluator (which we don't know yet how that will work).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caridy The identity discontinuity issue is not solved by the membrane. It is hidden well enough for some practical purposes but not others.

The parameterized evaluator is exactly where this pull request is relevant. We all agreed to move that portion of the realms proposal into the SES / parameterized evaluator proposal. But the new understanding needs new names. The thing we're talking about should no longer be called a Realm record, because it is no longer per Realm. It is per evaluation context.

Attn @jfparadis @michaelfig

@bmeck
Copy link
Member Author

bmeck commented Jan 29, 2019

@littledan

How does this PR relate to implementations? Do the JavaScript implementations used in Node.js support this usage mode, or is this PR a way to make the feature request to them? Are they interested in implementing this feature if the PR lands?

To my knowledge VMs do not support sharing the intrinsics record amongst Realms. V8 does allow reusing a global if it is not in use with v8::Context::New (however, it resets a bunch of things instead of persisting the intrinsics), but I am not familiar with other VMs supporting something similar nor of the ability to keep the Realm's intrinsics alive in a way that can be shared. This PR is in a way a feature request and an attempt to move some discussion that we see repeated in meetings be explored by the spec.

We see multiple talks about Realms either for JS APIs for creating virtual hosts, relating to potential isolation of JS references, and frozen Realms. From the perspective of Node.js ongoing work is being looked at isolation and freezing of intrinsics that do related to those talks. A common part of all those discussions is the ability to create Realms and intricacies of sharing intrinsics between Realms (often due to what we call the "Identity Discontinuity"). I would like to attempt implementing features using the ideas of those talks, but doing so requires us to make changes to allow reusing intrinsics.

@littledan
Copy link
Member

OK, I think it's good to have this PR written to be concrete, but I would encourage us to hold back on landing it until it we get implementation interest/experience (rather than just interest/consensus in committee).

@guybedford
Copy link

To explain an example use case here, the new loader work in Node.js has the loaders running in their own Realm, kind of like service workers. These threaded loaders spin up with not-insignificant startup time partially due to spinning up the intrinsics. If we could use the existing intrinsics securely (ie if they were frozen), then this would provide a great way to get the necessary isolation with memory and CPU benefits.

Node.js is kind of exploring this space ahead of browsers at this point due to these use cases being more specific to this platform. @littledan would you suggest working more closely with v8 on these kinds of investigations then first?

@devsnek
Copy link
Member

devsnek commented Mar 2, 2019

in V8 at least (relevent for node), using the same intrinsics will provide negligible cpu/memory benefits because it already uses the same ones internally: https://v8.dev/blog/embedded-builtins

@bmeck
Copy link
Member Author

bmeck commented Mar 2, 2019

@devsnek that does give memory savings, but isn't improving startup time https://v8.dev/blog/embedded-builtins#optimizing-for-performance ?

@devsnek
Copy link
Member

devsnek commented Mar 2, 2019

@bmeck in the section you link they mention reworking most of the indirect calls into single lookups relative to the root

I'm sure this still has good security reasons to exist, I'm just saying this isn't a blocker performance wise for node's use cases

@bmeck
Copy link
Member Author

bmeck commented Jul 23, 2019

XS has implemented something akin to this PR via "Compartments", you can read about it in their blog https://blog.moddable.com/blog/secureprivate/

@jfparadis
Copy link

@bmeck, this change is clever, thanks.

Questions:

  1. Thinking about other implementers and adoption, would it be desirable to be more specific around the host defined operations?
  2. Using a realm record to support a compartments means a compartment is a form of realm. I'm concerned that this would limit the capacity for realms to evolve (always need to think about the impact on compartments). Instead, would think it's possible to make compartments backed by a global environment record instead of a realm record?
@bmeck
Copy link
Member Author

bmeck commented Dec 13, 2019

@jfparadis

  1. I think it would be desirable, but hard to define clearly. Things like replacement of %Function% is a bit tricky. This PR would not allow that since it must come from an existing set of Intrinsics.For mutable compartments they would likely need to extend this PR a little allow piecemeal replacement of intrinsics so that things like %Function% get replaced.
  2. I am unclear in how compartments being a kind of Realm is problematic however, and don't think evolving one would be hampering the other currently. Is there a clear reason why compartments wouldn't be able to be recreated via a Realm API without a Comparment API?
@jfparadis
Copy link

  1. I believe that clear specifications will lead to easier and more predictable implementations, something desirable both for quick adoption and predictable security. It means of course more upfront work as you hint.

I think we can start from Moddable's implementation in XS as their compartment API already handles a module loader and a module map. CC @phoddie

  1. Here is why: it's only a hunch, but describing the compartment as a realm except for new intrinsics means everything added to the realm will be added to compartments. Doing so means conceptually that both "root realm" and "compartment" inherit from some abstract representation (a "realm").

On the other hand, in the original compartment shim, we shim compartments using a "with" statement inside a function, which acts as a function who's lexical environment is the global environment. In the new compartment shim, there is no relation to multiple realm records, a choice we made because XS has no support for multiple realms, and that simplified everything.

@allenwb
Copy link
Member

allenwb commented Dec 18, 2019

I fear that both the title of this PR and some of the recent discussion concerning it reflects a fundamental misunderstanding of the nature and mechanisms of the ES specification and how they are used to define the semantics of ECMAScript programs.

Apparently by "Realm Intrinsics record" what is meant is the Record that is the value of the [[Intrinsics]] fields of a Realm Record. That Record (like all Records in the specification) is a value of an ECMAScript Specification Type:

A specification type corresponds to meta-values that are used within algorithms to describe the semantics of ECMAScript language constructs ... Specification type values are specification artefacts that do not necessarily correspond to any specific entity within an ECMAScript implementation. [Emphasis added]

In other words, the ES specification and its specification types exist to define the intended (by TC39) static and runtime semantics of ECMAScript language elements. They are not intended to describe how those semantics should or could be implemented. Specification type values are not required runtime data structures. It is completely up to an implementation to decide on the internal data structures (if any) that are used to actually implement the required semantics.

In the case of a Realm's [[Intrinsics] Record. It primarily exists to support a specification shorthand. It provides an indirection mechanism that allows %Foo% to be be used, without additional verbiage, within ES pseudo-code to reference intrinsics objects that are specific to the current realm. In most cases (given the current ES semantics), such an indirection will be unnecessary at runtime within an implementation because the current realm (and its intrinsics) can be determined prior to evaluation of (most?) ECMAScript code.

From that perspective, "Allow hosts to reuse an existing Realm intrinsics record" seems like a nonsense statement. A "host" presumably corresponds to an ECMAScript implementation and such implementation are allowed to represent the semantics of the %Foo% mechanism in any manner that correctly implements the language level semantics described by the specification. It may or may not have a per realm data structure for accessing the realm intrinsics. If it does have such a data structure it may or may not be shared (in whole or part) by multiple realms. The only requirement is observable conformance to the specified language semantics.

The current specification does not talk about concepts such as shared [[Intrinsic]] Records because there are no current language semantics that require such sharing. Presumably what this PR is really about is "Define a semantics for multiple realms that share (some) common intrinsics". In that case, the proper starting point is to actually state informally the full semantics of such sharing. What, at the observable language level, will it mean if, for example, two realms map %Object% to the same object value. Note that isn't just about the [[Instrinsic]] Record? There may be deep invariants in the current specification that will be invalidated by that. Has anybody analyzed that possibility? After there is documented agreement on the semantics of shared intrinsics, only then it will be appropriate to talk about how to tweak the specification abstractions to express that semantics.

@jfparadis
Copy link

@allenwb We're simply evaluating this PR, not defining the specs for realms and compartments.

In other words, Bradley found a clever way to express what we need with appears to be a minimal change to the specs, so we are using this PR as a probe to explore how far we can go with it. Its one set of lenses to explore and understand the situation.

As you say, the realm record is a concept of the specs. It combines everything related to a realm: [[intrinsics]], [[GlobalObject]], etc. We're all in agreement here, implementers might or might not have the concept of a realm record.

However, the realm record implies more than that: it assumes a one-to-one relationship, that there is only one [[GlobalObject]] per [[intrinsics]]. That assumptions is visible in many aspects of the specs like GetGlobalObject(). The compartment API breaks that assumption. Something needs to change.

Bradley's PR basically mean that a compartment is like a "child realm": it has everything a realm has, except that it delegates its intrinsic record. It allows specifications like abstract operation GetGlobalObject() to go without any change. However, it means that the definition of realm does change to include compartments.

Regarding you question about two realms shearing the same intrinsics, yes it's been explored in the realm-shim, and even implemented in XS where the realm record has no intrinsics slot (there is only one set of intrinsics, and they are all global defines).

@ljharb ljharb force-pushed the master branch 3 times, most recently from 3d0c24c to 7a79833 Compare June 29, 2021 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs consensus This needs committee consensus before it can be eligible to be merged. normative change Affects behavior required to correctly evaluate some ECMAScript source text