Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Defense of Virtualization #18

Open
bakkot opened this issue Aug 27, 2018 · 12 comments
Open

In Defense of Virtualization #18

bakkot opened this issue Aug 27, 2018 · 12 comments

Comments

@bakkot
Copy link

bakkot commented Aug 27, 2018

(Disclosure: I am writing on behalf of my employer, Shape Security, which makes heavy use of virtualization as discussed below in its product.)

The web platform has long allowed early-running code to constrain or modify the behavior of code which runs after it, generally by replacing (virtualizing) some platform-provided objects or methods. This is widely used. As a simple example, Facebook, Airbnb, AliExpress, The Washington Post, LinkedIn, SoundCloud, and Tumblr are all virtualizing XMLHttpRequest, fetch, and/or some of their prototype methods. Even among this small list, virtualization is used for a variety of purposes: Tumblr is ensuring analytics are correctly triggered, AliExpress is attaching user identification headers, The Washington Post is analyzing network performance and errors, Facebook is doing something more obscure or at least more obscured.

With the getOriginals proposal as it stands, this sort of virtualization is rendered impractical. Any code can get access to the original, unmodified versions of XMLHttpRequest and its prototype methods and submit requests which bypass these mechanisms. And because widely-used libraries could (indeed, likely would) start depending on the original rather than virtualized feature, consumers of these libraries could end up bypassing virtualization strictly by accident in a way which is difficult for them to undo without major rewrites.

The ability for early-running code to virtualize platform-provided objects has existed in the web platform since its inception. It is widely relied upon, and as such it has been carefully maintained by TC39 and, until now, the rest of the web platform (see also the "extremely virtualizable" section below). I don't believe it is a good idea to destroy it.

Use cases

Analytics

The analytics use case as seen on The Washington Post is particularly common. Companies including Akamai, Instart Logic, Rollbar, Bugsnag, New Relic, Elastic, and Sentry provide products which rely on virtualization to meet it. The general strategy of analytics scripts along these lines is to add functionality to methods such as XMLHttpRequest.prototype.open or window.addEventListener which records when and how they are triggered and how long they take to complete, or which allows errors to be traced through callbacks and event listeners.

Given how much code already exists on a typical JS-rich page, often with much of it inside of third-party libraries, it is not practical to rewrite code which is already present to make it perform these sorts of analytics. These analytics libraries only work because it is possible to add them to a page above other, existing code and immediately get useful data without rewriting any other code.

Testing and verification

A developer wishing to make assertions about the behavior of a page - for example, that a request is triggered under particular conditions - needs to be able to stub out platform functions with versions containing the assertions in question.

Polyfills

It's common practice for websites to include polyfills for modern APIs which are not available in older browsers, so that modern libraries can work in those browsers as long as they run after the polyfill. If a library depends on the version of an API provided by getOriginals, however, no such polyfill is possible. On the browser that does not natively include the polyfilled API, the original value will be missing or broken. This does not just apply to completely new APIs, but to APIs that have been expanded with new parameters or behavior.

See primary discussion at #17.

Security

Although it's difficult to do correctly due to the large number of details required, there have been ongoing efforts to achieve certain security guarantees by introducing early-running code which virtualizes powerful APIs for later-running code. Google's Caja project is perhaps the longest-running effort, attempting to allow mutually untrusting modules to share a realm and be confined to a region of the document.

Considerations

Realms are not sufficient

The common case for virtualization is to virtualize the environment seen by all code on a page. Realms are not designed or suitable for this purpose; there is no effective way to use a realm to constrain the behavior of scripts which need to run in the context of a page and which are loaded through script tags in HTML.

Also, first-class realms aren't actually part of the web platform as it stands.

Service workers are not sufficient

Service workers can be used to intercept requests for the scripts on a page, thereby "virtualizing" them in some sense. But service workers are very limited: they can't intercept foreign fetches, there cannot be more than one on a page, a fetch can be intercepted at most once, requests objects cannot be mutated, etc. And in any case rewriting arbitrary JS in this manner is, in general, an undecidable problem.

Equivalent to making all properties non-configurable

The scope of this change is much larger than it appears - it is effectively equivalent to making each of the hundreds or thousands of built-in APIs non-configurable. A proposal which made any one of them non-configurable would be controversial on its own.

Speculative execution

Even though the threat model for security-minded applications is changed significantly by speculative execution attacks along the lines of Spectre and Meltdown, I don't believe this sort of virtualization is significantly affected, since I believe such attacks do not allow JavaScript code to call built-in functions which are not otherwise accessible.

Possible vs practical

There are workarounds to this sort of simple virtualization, and ways of preventing those workarounds, and so on. I want to emphasize, however, that there is a significant difference between it being possible to work around virtualization and it being practical. As it stands, no widely-used library is going to work around the effects of an early-running script's virtualizations, and so in practice it is straightforward to virtualize a typical page. If this proposal ships, that will change.

The web platform is extremely virtualizable

A complete list of unvirtualizable objects and properties in the web platform:

  • the global object and certain of its properties:
  • Infinity
  • NaN
  • undefined
  • window
  • top
  • location
  • document (not any of its prototype properties - document's [[Prototype]] is writable)
  • all 16 properties of location
  • document.location (an unforgeable getter/setter pair yielding window.location)

All other existing objects and properties can be replaced, either by overwriting them directly or overwriting the objects on which they exist or methods which would provide access to them.

@benjamingr
Copy link

Note that it is already possible to get an original value in a lot of cases and indeed when I worked on SDKs we did this pretty much always. The trick is to create another realm, grab the copy of that realm and then close it. For example

  • create an iframe
  • grab XMLHttpRequest/fetch/whatever from it
  • close it

This proposal just makes a better designed approach to get originals (which I believe platforms, libraries and SDKs would love to consume). For example at Node.js we have quite a bit of code to deal with "what if the user changed some built in".

@bakkot
Copy link
Author

bakkot commented Apr 13, 2019

I'm aware; see the section on "possible vs practical".

@benjamingr
Copy link

no widely-used library is going to work around the effects of an early-running script's virtualizations, and so in practice it is straightforward to virtualize a typical page. If this proposal ships, that will change.

Every widely used library I worked on either discussed doing it or did it though. You just have to do it when the environment is adversarial enough...

@bakkot
Copy link
Author

bakkot commented Apr 14, 2019

I'm curious which libraries are doing this, if you'd be OK naming them. On rare occasions I've encountered pages where some code was e.g. pulling XHR out of an iframe (in which case virtualization is still generally possible, as I say - you can virtualize the API they're using to make the iframe), but to my knowledge it's never been library code. And my virtualization code runs on a decent number of webpages.

The existence of libraries which discussed doing this and then did not kind of proves my point.

@benjamingr
Copy link

Here are some random examples I can name that are basically people writing an SDK:

  • At Peer5 we had to do this in our SDK in a few places.
  • At TipRanks I had to do this in an SDK I did for a client that was using Angular (that used Zones to override XMLHttpRequest).
  • Analytics companies (like Taboola) do this.
  • We do this at Node.js itself does this in its own usage to not break internal code

That is not to say there isn't a lot of merit in "virtualization", there are a few obvious examples, for example overriding creation of closed shadow roots to provide accessibility :)

@bakkot
Copy link
Author

bakkot commented Apr 14, 2019

Interesting, thanks! I'm very surprised to see an analytics company in there; I've seen a lot of analytics code, and while it's quite common for such code to be virtualizing XMLHttpRequest itself (that is in fact my primary use case for the ability to virtualize, and one which a huge fraction of the top websites are relying on), I've never seen any example of such code in the wild intentionally making a new realm to work around existing virtualization.

(I dislike the example of Node - Node is strictly more powerful than first-run code, since it controls the platform. It is extremely well positioned to virtualize and store references to whatever it wants already; it's not something which is looking to defeat virtualizing code which runs before it. We do not need to break the virtualizability of the web platform to accommodate Node's use case. In any case, it's not a library, at least not in the sense relevant to this topic.)

@benjamingr
Copy link

I dislike the example of Node - Node is strictly more powerful than first-run code, since it controls the platform

This is precisely the capability this proposal would like to bring to userland :)

Node.js does really ugly stuff for this like all of the code that uses this.

@bakkot
Copy link
Author

bakkot commented Apr 15, 2019

Yes, I know. My point is that there is a lot of code which relies on this capability not being available to userland. (Or rather, to be precise, being available only to first-run code, as Node already is.)

@benjamingr
Copy link

Yes, I know. My point is that there is a lot of code which relies on this capability not being available to userland. (Or rather, to be precise, being available only to first-run code, as Node already is.)

Understood, I think the way forward would be to enumerate these use cases and see how we can address having them.

For example overriding XMLHttpRequest is used for proxying requests - we have a much stronger model than that now (service workers) that enforces security a lot better (requires same domain and https).

It would be great to go over some use cases and figure out what we can do about it.

@bakkot
Copy link
Author

bakkot commented Apr 16, 2019

Service workers don't really compose, can't generally be provided as part of a library, and only even see requests to the same origin, so they are not a solution to even the very specific use case of proxying requests. But sure, we can enumerate some other use cases. Besides polyfills and security safeguards, I've seen code instrument addEventListener for analytics purposes, IIRC to figure out which if any event listeners were taking noticeable amounts of time, and I know a lot of code hooks DOM apis for one reason or another. I know my company has at least one other use case which I don't know if I can talk about. And we could go ask all the companies I listed above if they'd be willing to share what they're doing, though of course that's going to be biased towards ones which I happen to know about, and some of them (especially Facebook) probably aren't willing to tell us.

The thing is, there is a lot of code relying on virtualization working, written by a lot of people, many of whom would not necessarily even consciously recognize that this is what they were relying upon. We can't possibly cover all of their use cases. We can't even know all of their uses cases. I don't think we should be willing to break a previously reliable assumption around which people have built a lot of logic even if we provide alternatives for a handful of the many things for which people were relying on it.

@benjamingr
Copy link

Service workers don't really compose, can't generally be provided as part of a library, and only even see requests to the same origin, so they are not a solution to even the very specific use case of proxying requests.

Sure though to be fair proxying requests only worked for XHR/fetch before and not for all APIS that can make requests (like for example: WebRTC, native HLS playback in browsers that have it and other places).

Besides polyfills and security safeguards

Would love more concrete examples of this :)

I've seen code instrument addEventListener for analytics purposes, IIRC to figure out which if any event listeners were taking noticeable amounts of time

We do very intensive events work at Testim so I can sympathise - this approach is problematic anyway since frameworks and tools typically add a global event listener and delegate and listen to a different event (like react listening to the toElement of a mouseout instead of a mouseenter).

That said, there is a lot of merit in a more general approach for figuring out what code dealt with what event to help figure out causality or performance issues. I wonder if Zones do this (or needed to do this) (override addEventListener) (cc @domenic if you happen to know since you worked on zones).

I think there is merit in systematically enumerating the use cases of "virtualisation" to see what (if anything) this proposal might break.


To clarify - even with this proposal you can still do AoP ("virtualisation") for all code except for platform code that needs to be able to opt out.

This proposal is a non-default more-work-than-none way to opt-out, basically.

@bakkot
Copy link
Author

bakkot commented Apr 16, 2019

Sure though to be fair proxying requests only worked for XHR/fetch

Yes, but it does work for those. And that capability is very widely relied upon.

Would love more concrete examples of this :)

Well, babel-polyfill is the obvious one. @ljharb can give other concrete examples, I'm sure. That's more of #17 though.

I think there is merit in systematically enumerating the use cases of "virtualisation"

There is no way to do this. I can list ones I'm familiar with. I can point to some other cases where I can observe that it's being used. But there really isn't a way to systematically enumerate all use cases.

to see what (if anything) this proposal might break.

It definitely breaks instrumentation of XHRs and fetches. I don't know that "if anything" is a necessary qualifier at this point.

To clarify - even with this proposal you can still do AoP ("virtualisation") for all code except for platform code that needs to be able to opt out.

I'm not sure what this means. You said above (and the readme says) that the point was to bring this capability to userland, not just to platform code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants