Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Partial Content / 206 #144

Open
mnot opened this issue Oct 26, 2015 · 35 comments
Open

Handling Partial Content / 206 #144

mnot opened this issue Oct 26, 2015 · 35 comments

Comments

@mnot
Copy link
Member

mnot commented Oct 26, 2015

Discussed in #97 - breaking out into a separate issue.

@mnot
Copy link
Member Author

mnot commented Oct 27, 2015

Some discussion at TPAC. There are many use cases for range requests, initial approach might be to identify bare minimum and only support those. E.g.,

  1. Pass through range requests to the network
  2. Serve range requests from a complete cache entry
  3. Cache partial content when requested
  4. Synthesise range requests to complete partial content (for full or partial requests)
  5. ...

(0) and (1) are a straw-man target for now. Thoughts?

@annevk
Copy link
Member

annevk commented Nov 11, 2015

So from those:

  1. Already happens.
  2. Is something for the Cache API to decide.
  3. Already happens in step 8 of https://fetch.spec.whatwg.org/#concept-http-network-fetch as far as I can tell.
  4. This does not currently happen.

Here are some things I'm wondering about:

  1. Should we change https://fetch.spec.whatwg.org/#concept-http-network-or-cache-fetch step 15 to talk about one or more partial responses?
  2. Should step 17.1 explicitly talk about a complete response?
  3. Is there any freshening of headers when you get a 206 response? https://tools.ietf.org/html/rfc7234#section-4.3.4 seems to suggest there isn't. HTTP Range Requests also does not talk about this as far as I can tell.
  4. Is therefore the only thing we need to introduce a 206 clause (similar to step 17 for 304) when cache mode is either "default" or "no-cache" that does recombining?
@mnot
Copy link
Member Author

mnot commented Nov 12, 2015

  1. You can take the approach that every 206 is combined into one cache entry as soon as it occurs, or you can alternatively store multiple 206s and combine them at some later time. HTTP doesn't care which, as long as the effect is the same; see http://httpwg.github.io/specs/rfc7234.html#combining.responses
  2. Step 14 already requires a complete response.
  3. see http://httpwg.github.io/specs/rfc7233.html#combining.byte.ranges
  4. maybe, yeah...
@mnot
Copy link
Member Author

mnot commented Nov 12, 2015

Is something for the Cache API to decide.

I mean the HTTP cache, not the SW cache. Or do you mean it's effectively an implementation decision?

@annevk
Copy link
Member

annevk commented Nov 12, 2015

I mean the HTTP cache, not the SW cache.

Ah I see. Maybe we should talk about that too.

@code-tree
Copy link

code-tree commented Jun 3, 2016

Serve range requests from a complete cache entry

Any hope that this will be implemented, in terms of being able to do it from a SW cache? It certainly would help with a problem I currently have with the Audio element and range headers

@jakearchibald
Copy link
Collaborator

I need to do some research to see what browsers do today vs the spec & decide if the service worker spec needs changes, or browsers need to fix bugs.

In the mean time, check out https://samdutton.github.io/samples/service-worker/prefetch-video/ - it's a bit hacky, but it works. It constructs ranged responses on the fly.

@code-tree
Copy link

Thanks for the link, the workaround worked for my audio problem too.

@jakearchibald
Copy link
Collaborator

So, I did the research on range requests. It's a mess. https://docs.google.com/document/d/1SphP-WNxqzZrSv_6ApC9_FpM-m_tLzm57oL3SNGg-40/edit#heading=h.1k8r6xdc6vfo

I think fetch should say something about the minimum browsers are required to support here.

@jakearchibald
Copy link
Collaborator

Also, unless I'm missing something, fetch should also create a network error if a 206 is returned when no range was requested.

@annevk
Copy link
Member

annevk commented Jun 20, 2016

@jakearchibald if you have time to work on this (or can find someone else) that would be great. I'm kinda swamped.

@jakearchibald
Copy link
Collaborator

I'll talk to our media teams and see what their expectations are and get as far as I can.

@mnot
Copy link
Member Author

mnot commented Mar 14, 2017

FWIW, I've written some basic tests to see if the HTTP (!SW) cache that Fetch uses supports partial content (i.e., range requests).

AFAICT, the only browser that does anything is Safari TP, which will store a partial response (i.e., one whose request had a Range header) and serve that response from cache if a request with the same Range header is made.

It will not serve a subset of an already-cached response to a request containing Range; nor will it complete an existing stored partial response by creating its own Range headers when it goes forward.

Firefox and Chrome don't even handle the simple case (again, according to my test; I talked to @mcmanus about this this AM, and he was a bit surprised to hear that). Edge doesn't support the HTTP cache from Fetch, AFAICT.

annevk pushed a commit that referenced this issue Mar 23, 2017
In particular:

* Be more specific about terminology
* Detail more clearly how requests are to be modified

Tests: web-platform-tests/wpt#5137.

During review we decided to postpone #144 (poorly implemented if at all) and #307 (also poorly implemented despite security implications).

Fixes #336 and fixes #373.
@jakearchibald
Copy link
Collaborator

I've been thinking about this again, here's a brain dump:

  • I should run my tests again, but using a wav file, which would eliminate weirdness due to metadata being at the end of the file. (thanks @foolip!)
  • Research which APIs follow the validation rules for combining ranges https://tools.ietf.org/html/rfc7233#section-4.3 - @foolip suggests that media elements don't care, but what about other APIs? It would be pretty bad security-wise if some APIs didn't validate.
  • It's down to individual APIs to say if they can accept ranged requests, what type of validation they require, and what URL they're prepared to accept a response from. Eg, for the first part of a response, they may accept any url, but for further parts they may only accept responses from that url.
@annevk
Copy link
Member

annevk commented Mar 31, 2017

I don't think we do range requests for anything but media elements, though that might have changed recently for image elements now that I think of it?

@jakearchibald
Copy link
Collaborator

There's downloads, which I guess are outside of spec land. We want to use them in background fetch.

@annevk
Copy link
Member

annevk commented Mar 31, 2017

Downloads are somewhat defined, as they interact with navigation and <a download> and such. Also, whatwg/html#954.

@jakearchibald
Copy link
Collaborator

I pitched the following to our security team:


  • Add "Range" to safelist headers, allowing it to be sent to any server. Perhaps restrict value to "bytes=[number]-[optional number]" if that offers any protection.
  • Return a network error if any of the following is true:
    • The response has status 206 and request does not have a range header.
    • The response has status 206, the response is opaque, and the request url is not the first entry in the response url list.

My assumption is that APIs that range requests are only used by media elements and downloads (which I'll need to verify). This means you can't interpret a portion of a resource as script/css/etc.

Additionally, APIs consuming ranged responses should ensure all parts of a range have the same first entry in the response url list for a given resource.


However, there's still a worry that this new capability carries significant risk, and that we should look for another way forward.

The alternative solution is to find a way to mark a request as "allowed privileged headers", and allow the Range header in that case. Modifying the request in any way would remove the "allowed privileged headers" flag, meaning you couldn't take an internally-created Range request & change the URL & make the request, but you could do fetch(fetchEvent.request) if it had a Range header.

This means new Request(request) would copy requests "allowed privileged headers" flag, although modifying request.headers will unset it. new Request(request, init) would return a request with "allowed privileged headers" unset.

In addition to this, we should still:

  • Return a network error if any of the following is true:
    • The response has status 206 and request does not have a range header.
    • The response has status 206, the response is opaque, and the request url is not the first entry in the response url list.

Additionally, APIs consuming ranged responses should ensure all parts of a range have the same first entry in the response url list for a given resource.

@DanielBaulig
Copy link

Just a note here on

Return a network error if any of the following is true:

  • The response has status 206 and request does not have a range header.

We (Facebook) are actually using this currently on XHR as a feature to circumvent CORS preflight requests requirements for some cross origin requests. We add the range as query string parameters and have the cross origin server interpret them equivalent to range headers.

We are currently in transitioning some of these things to fetch and breaking this would be a big problem for us. Adding Range to the header safe-list however, would probably meet our requirements and in fact would likely be the superior solution anyway, since it will allow us to fully remain within HTTP semantics and allow for better caching at various levels.

@jakearchibald
Copy link
Collaborator

@DanielBaulig thanks! If we shipped what I proposed it'd have broken XHR too, since XHR uses fetch.

Is the server sending a 206 status in this case? I guess you're using this in a super-safe way that doesn't allow the original content to be interpreted as script?

@DanielBaulig
Copy link

DanielBaulig commented Aug 21, 2017

@jakearchibald I thought it did, but I just double checked and it actually doesn't return a 206, but a 200, so there shouldn't be any acute breakage.

We are only applying this to media content fetched through XHR/fetch and currently always have well aligned requests to the same byte ranges, meaning the byte ranges of two requests will always either be equal or excluding each other, there will never be partial overlaps, so we shouldn't have any cacheability regressions.

That said, the reason I stumbled on this issue thread in the first place is that we are looking into changing this and using less fixed and well aligned byte ranges that could end up (partially) overlapping. Since that would break caching (the query string parameters are included in the browsers cache keys, if they are not identical, there won't be a cache hit), we were looking into implementing caching for these requests in SW by breaking the byte range query parameters out of the cache key for these requests, which then lead me to this thread and also to my reply on this W3C SW thread regarding Cache API supporting Range requests and responses.

Since we are not actually returning 206 though, we should not be seeing any problems from this proposal being implemented. My bad.

@annevk
Copy link
Member

annevk commented Aug 22, 2017

Still, I think we should test what browsers actually do when the server returns a 206. If browsers just return it as-is I don't think we should change the behavior into returning a network error. It's not worth the risk.

@sirdarckcat
Copy link

@DanielBaulig would you mind to show an example of such artificial range parameter? How could I test the behavior you described?

@jakearchibald
Copy link
Collaborator

Seconded, it'd be interesting to know which Facebook URLs support this.

@jakearchibald
Copy link
Collaborator

All browsers seem to allow a 206 partial response for script elements. As in, it will execute the body of the response, as if it was a 200.

Chrome security were a little worried about this, and were keen on making this an error.

But yeah, we'd need to do a lot of testing.

@DanielBaulig
Copy link

@sirdarckcat We use bytestart, byteend query string parameters for video playback on Facebook to deal with CORS restrictions. If we added normal Range headers we would have to do preflight requests to resources served from our CDN origin.

@jakearchibald
Copy link
Collaborator

jakearchibald commented Feb 23, 2018

I'm going to take another swing at this.

These cases should work:

  • A media element makes a series of ranged requests. Each request arrives in the service worker, is fetched, and the response returned. This works without a service worker, so it should work in a pass-through case.
  • A media element makes a series of ranged requests. The service worker uses a mixture of custom responses, and non-opaque responses. All of these responses are non-opaque, so there's no issue.
  • A script element makes a request. The request arrives in the service worker, it's fetched, and the response is partial. The script element accepts and executes the partial response. It's super weird, but browsers already allow this without service worker. As @annevk said, we shouldn't break that as part of this effort. If we want to break this, we'll look at it separately.

Since the service worker can rewrite fetches, it opens up the following attacks:

Attack 1

A media element makes two requests:

Request: Resource A. No-cors. Cross-origin. Byte range 0-5000.
Response: JS-constructed. Valid media container data, stopping before the bit that defines the duration of the media.

Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request). Response is opaque.

In this case, resource A isn't a valid media resource, but its 200th byte is now leaked as mediaElement.duration.

Solution: The media element must not allow a mixture of opaque and non-opaque responses for a given piece of media.

Attack 2

A media element makes two requests:

Request: Resource A. No-cors. Cross-origin. Byte range 0-5000.
Response: A fetch to resource B, which responds with opaque valid media container data, stopping before the bit that defines the duration of the media.

Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request). Response is opaque.

Again, the 200th byte is leaked.

Solution: If the media element receives opaque data, the last URL in each response's URL list must be identical.

Looking at #145, it seems Chrome is fine as long as the responses are all the same origin. However, I'm worried about origin A redirecting/rewriting to lots of different places in origin B.

I need to spec where a media element goes for the second part of some media, if the first part results in a redirect. Browsers behave differently here.

Attack 3

A media element makes two requests:

Request: Resource A. No-cors. Cross-origin. Byte range 0-5000.
Response: A previously cached response, part of resource A, byte range 8000-8200, which just happens to be opaque valid media container data, stopping before the bit that defines the duration of the media.

Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request). Response is opaque.

Again, the 200th byte is leaked.

Solution: In the first fetch, the start of the byte range returned does not match the start of the byte range requested. This should be rejected.

We could reject this in the fetch spec, but I don't think we should block it for manual fetch() calls.

This should be blocked by the media element, but we could have a helper in the fetch spec to make this easier.

Attack 4

A script element makes a request:

Request: Resource A. No-cors. Cross-origin.
Response: An opaque partial response of resource A that just happens to contain gender = 'female', which is private user data.

In this case resource A is an html resource like:

<p>Foo</p>
<script>const gender = 'female';</script>
<p>Bar</p>

…and the browser has been previously tricked (perhaps using a media element) into making a request for the range that contains gender = 'female'.

Given that the script element accepts partial responses, this is a tricky one. It seems that we want to continue to support the case where the server has, unprompted by the range header, returned a partial response. The difference in this case is the server was promted.

Solution: The response needs to know if its associated request had a Range header. Fetch should reject if the original request did not have a range header, but the service worker provides a response that is opaque, partial, and was requested with a range header.

@annevk I'm interested in your thoughts on the solution for attack 4.

@jakearchibald
Copy link
Collaborator

(I chatted about this with @annevk on IRC & he's happy with the attack 4 solution)

@annevk
Copy link
Member

annevk commented Feb 26, 2018

I'm a little concerned with

If the media element consumes opaque data, the first URL in each response's URL list must be identical.

from Attack 2. In particular if that's what we want to do in the face of redirects. If anything we probably want to compare the last URL. (Safer would be the whole list, but unfortunately no-cors cross-origin to same-origin is already non-opaque.)

@jakearchibald
Copy link
Collaborator

My intent is to ensure that all the requests have gone to the same place, as in, the service worker hasn't tried to combine multiple sources.

If the server wants to redirect each request, that seems weird but fine. Although I'll test what browsers actually do in this case.

@jakearchibald
Copy link
Collaborator

@annevk ahh yes, I get it now. I've updated the solution to Attack 2 to be last url in the list.

@ricea
Copy link
Collaborator

ricea commented Apr 5, 2018

@jakearchibald Would it be possible to copy-and-paste your security analysis in #144 (comment) to a more official-sounding URL? It would be nice if it was linked from the Fetch standard to make it discoverable.

It would be nice if it said something about the threat model, but that might be infeasible if it came down to "the threat model of the whole web platform".

@jakearchibald
Copy link
Collaborator

@ricea I was going to leave notes in the spec next to the solutions. Does that work?

The solution for Attack 4 will be in the fetch spec, but the others will be in the HTML spec.

Is this what you meant? If not, I'm happy to put the text somewhere else, I'm just not sure where.

@ricea
Copy link
Collaborator

ricea commented Apr 5, 2018

@jakearchibald SGTM. My concern is just to make sure it is discoverable and gets the widest possible review.

annevk pushed a commit that referenced this issue May 28, 2018
This is part of #144.

The aim is to allow APIs to use the range header for no-cors
requests, and allow them to pass through a service worker, but
disallow modification of these requests, and disallow developers
creating their own no-cors ranged requests.

Tests: web-platform-tests/wpt#10348.
jakearchibald added a commit that referenced this issue May 29, 2018
This is part of #144.

The aim is to allow APIs to use the range header for no-cors
requests, and allow them to pass through a service worker, but
disallow modification of these requests, and disallow developers
creating their own no-cors ranged requests.

Tests: web-platform-tests/wpt#10348.
annevk pushed a commit that referenced this issue May 29, 2018
This is part of #144.

The aim is to allow APIs to use the Range header for "no-cors"
requests, and allow them to pass through a service worker, but
disallow modification of these requests, and disallow developers
creating their own "no-cors" ranged requests.

Tests: web-platform-tests/wpt#10348.
@noamr
Copy link
Contributor

noamr commented Feb 23, 2022

See new PR: whatwg/html#7655. Needs some more work and lots of WPT but ready for initial feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
8 participants