Handling Partial Content / 206 #144

mnot · 2015-10-26T06:34:05Z

Discussed in #97 - breaking out into a separate issue.

mnot · 2015-10-27T06:17:08Z

Some discussion at TPAC. There are many use cases for range requests, initial approach might be to identify bare minimum and only support those. E.g.,

Pass through range requests to the network
Serve range requests from a complete cache entry
Cache partial content when requested
Synthesise range requests to complete partial content (for full or partial requests)
...

(0) and (1) are a straw-man target for now. Thoughts?

annevk · 2015-11-11T09:21:59Z

So from those:

Already happens.
Is something for the Cache API to decide.
Already happens in step 8 of https://fetch.spec.whatwg.org/#concept-http-network-fetch as far as I can tell.
This does not currently happen.

Here are some things I'm wondering about:

Should we change https://fetch.spec.whatwg.org/#concept-http-network-or-cache-fetch step 15 to talk about one or more partial responses?
Should step 17.1 explicitly talk about a complete response?
Is there any freshening of headers when you get a 206 response? https://tools.ietf.org/html/rfc7234#section-4.3.4 seems to suggest there isn't. HTTP Range Requests also does not talk about this as far as I can tell.
Is therefore the only thing we need to introduce a 206 clause (similar to step 17 for 304) when cache mode is either "default" or "no-cache" that does recombining?

mnot · 2015-11-12T02:18:52Z

You can take the approach that every 206 is combined into one cache entry as soon as it occurs, or you can alternatively store multiple 206s and combine them at some later time. HTTP doesn't care which, as long as the effect is the same; see http://httpwg.github.io/specs/rfc7234.html#combining.responses
Step 14 already requires a complete response.
see http://httpwg.github.io/specs/rfc7233.html#combining.byte.ranges
maybe, yeah...

mnot · 2015-11-12T04:26:10Z

Is something for the Cache API to decide.

I mean the HTTP cache, not the SW cache. Or do you mean it's effectively an implementation decision?

annevk · 2015-11-12T08:36:58Z

I mean the HTTP cache, not the SW cache.

Ah I see. Maybe we should talk about that too.

code-tree · 2016-06-03T10:56:23Z

Serve range requests from a complete cache entry

Any hope that this will be implemented, in terms of being able to do it from a SW cache? It certainly would help with a problem I currently have with the Audio element and range headers

jakearchibald · 2016-06-03T12:21:07Z

I need to do some research to see what browsers do today vs the spec & decide if the service worker spec needs changes, or browsers need to fix bugs.

In the mean time, check out https://samdutton.github.io/samples/service-worker/prefetch-video/ - it's a bit hacky, but it works. It constructs ranged responses on the fly.

code-tree · 2016-06-04T10:13:07Z

Thanks for the link, the workaround worked for my audio problem too.

jakearchibald · 2016-06-15T17:43:25Z

So, I did the research on range requests. It's a mess. https://docs.google.com/document/d/1SphP-WNxqzZrSv_6ApC9_FpM-m_tLzm57oL3SNGg-40/edit#heading=h.1k8r6xdc6vfo

I think fetch should say something about the minimum browsers are required to support here.

jakearchibald · 2016-06-15T17:50:31Z

Also, unless I'm missing something, fetch should also create a network error if a 206 is returned when no range was requested.

annevk · 2016-06-20T09:01:45Z

@jakearchibald if you have time to work on this (or can find someone else) that would be great. I'm kinda swamped.

jakearchibald · 2016-06-20T10:44:49Z

I'll talk to our media teams and see what their expectations are and get as far as I can.

mnot · 2017-03-14T07:05:28Z

FWIW, I've written some basic tests to see if the HTTP (!SW) cache that Fetch uses supports partial content (i.e., range requests).

AFAICT, the only browser that does anything is Safari TP, which will store a partial response (i.e., one whose request had a Range header) and serve that response from cache if a request with the same Range header is made.

It will not serve a subset of an already-cached response to a request containing Range; nor will it complete an existing stored partial response by creating its own Range headers when it goes forward.

Firefox and Chrome don't even handle the simple case (again, according to my test; I talked to @mcmanus about this this AM, and he was a bit surprised to hear that). Edge doesn't support the HTTP cache from Fetch, AFAICT.

In particular: * Be more specific about terminology * Detail more clearly how requests are to be modified Tests: web-platform-tests/wpt#5137. During review we decided to postpone #144 (poorly implemented if at all) and #307 (also poorly implemented despite security implications). Fixes #336 and fixes #373.

jakearchibald · 2017-03-31T01:13:05Z

I've been thinking about this again, here's a brain dump:

I should run my tests again, but using a wav file, which would eliminate weirdness due to metadata being at the end of the file. (thanks @foolip!)
Research which APIs follow the validation rules for combining ranges https://tools.ietf.org/html/rfc7233#section-4.3 - @foolip suggests that media elements don't care, but what about other APIs? It would be pretty bad security-wise if some APIs didn't validate.
It's down to individual APIs to say if they can accept ranged requests, what type of validation they require, and what URL they're prepared to accept a response from. Eg, for the first part of a response, they may accept any url, but for further parts they may only accept responses from that url.

annevk · 2017-03-31T06:33:35Z

I don't think we do range requests for anything but media elements, though that might have changed recently for image elements now that I think of it?

jakearchibald · 2017-03-31T08:59:35Z

There's downloads, which I guess are outside of spec land. We want to use them in background fetch.

annevk · 2017-03-31T10:30:04Z

Downloads are somewhat defined, as they interact with navigation and <a download> and such. Also, whatwg/html#954.

jakearchibald · 2017-06-29T10:50:11Z

I pitched the following to our security team:

Add "Range" to safelist headers, allowing it to be sent to any server. Perhaps restrict value to "bytes=[number]-[optional number]" if that offers any protection.
Return a network error if any of the following is true:
- The response has status 206 and request does not have a range header.
- The response has status 206, the response is opaque, and the request url is not the first entry in the response url list.

My assumption is that APIs that range requests are only used by media elements and downloads (which I'll need to verify). This means you can't interpret a portion of a resource as script/css/etc.

Additionally, APIs consuming ranged responses should ensure all parts of a range have the same first entry in the response url list for a given resource.

However, there's still a worry that this new capability carries significant risk, and that we should look for another way forward.

The alternative solution is to find a way to mark a request as "allowed privileged headers", and allow the Range header in that case. Modifying the request in any way would remove the "allowed privileged headers" flag, meaning you couldn't take an internally-created Range request & change the URL & make the request, but you could do fetch(fetchEvent.request) if it had a Range header.

This means new Request(request) would copy requests "allowed privileged headers" flag, although modifying request.headers will unset it. new Request(request, init) would return a request with "allowed privileged headers" unset.

In addition to this, we should still:

Return a network error if any of the following is true:
- The response has status 206 and request does not have a range header.
- The response has status 206, the response is opaque, and the request url is not the first entry in the response url list.

Additionally, APIs consuming ranged responses should ensure all parts of a range have the same first entry in the response url list for a given resource.

DanielBaulig · 2017-08-18T20:31:44Z

Just a note here on

Return a network error if any of the following is true:

The response has status 206 and request does not have a range header.

We (Facebook) are actually using this currently on XHR as a feature to circumvent CORS preflight requests requirements for some cross origin requests. We add the range as query string parameters and have the cross origin server interpret them equivalent to range headers.

We are currently in transitioning some of these things to fetch and breaking this would be a big problem for us. Adding Range to the header safe-list however, would probably meet our requirements and in fact would likely be the superior solution anyway, since it will allow us to fully remain within HTTP semantics and allow for better caching at various levels.

jakearchibald · 2017-08-21T12:38:38Z

@DanielBaulig thanks! If we shipped what I proposed it'd have broken XHR too, since XHR uses fetch.

Is the server sending a 206 status in this case? I guess you're using this in a super-safe way that doesn't allow the original content to be interpreted as script?

DanielBaulig · 2017-08-21T19:53:50Z

@jakearchibald I thought it did, but I just double checked and it actually doesn't return a 206, but a 200, so there shouldn't be any acute breakage.

We are only applying this to media content fetched through XHR/fetch and currently always have well aligned requests to the same byte ranges, meaning the byte ranges of two requests will always either be equal or excluding each other, there will never be partial overlaps, so we shouldn't have any cacheability regressions.

That said, the reason I stumbled on this issue thread in the first place is that we are looking into changing this and using less fixed and well aligned byte ranges that could end up (partially) overlapping. Since that would break caching (the query string parameters are included in the browsers cache keys, if they are not identical, there won't be a cache hit), we were looking into implementing caching for these requests in SW by breaking the byte range query parameters out of the cache key for these requests, which then lead me to this thread and also to my reply on this W3C SW thread regarding Cache API supporting Range requests and responses.

Since we are not actually returning 206 though, we should not be seeing any problems from this proposal being implemented. My bad.

annevk · 2017-08-22T07:06:30Z

Still, I think we should test what browsers actually do when the server returns a 206. If browsers just return it as-is I don't think we should change the behavior into returning a network error. It's not worth the risk.

sirdarckcat · 2017-08-22T07:09:33Z

@DanielBaulig would you mind to show an example of such artificial range parameter? How could I test the behavior you described?

jakearchibald · 2017-08-22T08:17:34Z

Seconded, it'd be interesting to know which Facebook URLs support this.

jakearchibald · 2017-08-22T08:33:47Z

All browsers seem to allow a 206 partial response for script elements. As in, it will execute the body of the response, as if it was a 200.

Chrome security were a little worried about this, and were keen on making this an error.

But yeah, we'd need to do a lot of testing.

DanielBaulig · 2017-10-30T23:34:18Z

@sirdarckcat We use bytestart, byteend query string parameters for video playback on Facebook to deal with CORS restrictions. If we added normal Range headers we would have to do preflight requests to resources served from our CDN origin.

jakearchibald · 2018-02-23T15:28:54Z

I'm going to take another swing at this.

These cases should work:

A media element makes a series of ranged requests. Each request arrives in the service worker, is fetched, and the response returned. This works without a service worker, so it should work in a pass-through case.
A media element makes a series of ranged requests. The service worker uses a mixture of custom responses, and non-opaque responses. All of these responses are non-opaque, so there's no issue.
A script element makes a request. The request arrives in the service worker, it's fetched, and the response is partial. The script element accepts and executes the partial response. It's super weird, but browsers already allow this without service worker. As @annevk said, we shouldn't break that as part of this effort. If we want to break this, we'll look at it separately.

Since the service worker can rewrite fetches, it opens up the following attacks:

Attack 1

A media element makes two requests:

Request: Resource A. No-cors. Cross-origin. Byte range 0-5000.
Response: JS-constructed. Valid media container data, stopping before the bit that defines the duration of the media.

Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request). Response is opaque.

In this case, resource A isn't a valid media resource, but its 200th byte is now leaked as mediaElement.duration.

Solution: The media element must not allow a mixture of opaque and non-opaque responses for a given piece of media.

Attack 2

A media element makes two requests:

Request: Resource A. No-cors. Cross-origin. Byte range 0-5000.
Response: A fetch to resource B, which responds with opaque valid media container data, stopping before the bit that defines the duration of the media.

Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request). Response is opaque.

Again, the 200th byte is leaked.

Solution: If the media element receives opaque data, the last URL in each response's URL list must be identical.

Looking at #145, it seems Chrome is fine as long as the responses are all the same origin. However, I'm worried about origin A redirecting/rewriting to lots of different places in origin B.

I need to spec where a media element goes for the second part of some media, if the first part results in a redirect. Browsers behave differently here.

Attack 3

A media element makes two requests:

Request: Resource A. No-cors. Cross-origin. Byte range 0-5000.
Response: A previously cached response, part of resource A, byte range 8000-8200, which just happens to be opaque valid media container data, stopping before the bit that defines the duration of the media.

Request: Resource A. No-cors. Cross-origin. Byte range 200-5000.
Response: fetch(event.request). Response is opaque.

Again, the 200th byte is leaked.

Solution: In the first fetch, the start of the byte range returned does not match the start of the byte range requested. This should be rejected.

We could reject this in the fetch spec, but I don't think we should block it for manual fetch() calls.

This should be blocked by the media element, but we could have a helper in the fetch spec to make this easier.

Attack 4

A script element makes a request:

Request: Resource A. No-cors. Cross-origin.
Response: An opaque partial response of resource A that just happens to contain gender = 'female', which is private user data.

In this case resource A is an html resource like:

<p>Foo</p>
<script>const gender = 'female';</script>
<p>Bar</p>

…and the browser has been previously tricked (perhaps using a media element) into making a request for the range that contains gender = 'female'.

Given that the script element accepts partial responses, this is a tricky one. It seems that we want to continue to support the case where the server has, unprompted by the range header, returned a partial response. The difference in this case is the server was promted.

Solution: The response needs to know if its associated request had a Range header. Fetch should reject if the original request did not have a range header, but the service worker provides a response that is opaque, partial, and was requested with a range header.

@annevk I'm interested in your thoughts on the solution for attack 4.

jakearchibald · 2018-02-26T08:14:44Z

(I chatted about this with @annevk on IRC & he's happy with the attack 4 solution)

annevk · 2018-02-26T08:37:11Z

I'm a little concerned with

If the media element consumes opaque data, the first URL in each response's URL list must be identical.

from Attack 2. In particular if that's what we want to do in the face of redirects. If anything we probably want to compare the last URL. (Safer would be the whole list, but unfortunately no-cors cross-origin to same-origin is already non-opaque.)

jakearchibald · 2018-02-26T08:47:44Z

My intent is to ensure that all the requests have gone to the same place, as in, the service worker hasn't tried to combine multiple sources.

If the server wants to redirect each request, that seems weird but fine. Although I'll test what browsers actually do in this case.

jakearchibald · 2018-02-28T08:56:51Z

@annevk ahh yes, I get it now. I've updated the solution to Attack 2 to be last url in the list.

ricea · 2018-04-05T06:27:26Z

@jakearchibald Would it be possible to copy-and-paste your security analysis in #144 (comment) to a more official-sounding URL? It would be nice if it was linked from the Fetch standard to make it discoverable.

It would be nice if it said something about the threat model, but that might be infeasible if it came down to "the threat model of the whole web platform".

jakearchibald · 2018-04-05T06:33:14Z

@ricea I was going to leave notes in the spec next to the solutions. Does that work?

The solution for Attack 4 will be in the fetch spec, but the others will be in the HTML spec.

Is this what you meant? If not, I'm happy to put the text somewhere else, I'm just not sure where.

ricea · 2018-04-05T06:36:41Z

@jakearchibald SGTM. My concern is just to make sure it is discoverable and gets the widest possible review.

This is part of #144. The aim is to allow APIs to use the range header for no-cors requests, and allow them to pass through a service worker, but disallow modification of these requests, and disallow developers creating their own no-cors ranged requests. Tests: web-platform-tests/wpt#10348.

This is part of #144. The aim is to allow APIs to use the Range header for "no-cors" requests, and allow them to pass through a service worker, but disallow modification of these requests, and disallow developers creating their own "no-cors" ranged requests. Tests: web-platform-tests/wpt#10348.

noamr · 2022-02-23T18:05:04Z

See new PR: whatwg/html#7655. Needs some more work and lots of WPT but ready for initial feedback.

annevk mentioned this issue Nov 11, 2015

Cache state: partial content #38

Closed

matt-cook mentioned this issue Jun 27, 2016

handle 206 partial requests zebradog/alone.js#8

Open

HadrienGardeur mentioned this issue Oct 26, 2016

Support for Range header in Service Worker HadrienGardeur/audiobook-player#1

Open

mnot mentioned this issue Mar 1, 2017

Http cache #496

Merged

jakearchibald mentioned this issue Mar 31, 2017

Create F2F agenda - 3-4 April 2017 w3c/ServiceWorker#1053

Closed

jakearchibald mentioned this issue Jun 22, 2017

Range header is not send w3c/ServiceWorker#1164

Closed

vardius mentioned this issue Jun 23, 2017

Handle range requests vardius/peer-cdn#7

Open

jakearchibald mentioned this issue Jun 29, 2017

Allow range header to be set by APIs #560

Merged

jakearchibald mentioned this issue Nov 3, 2017

Create F2F agenda - 7 November 2017 w3c/ServiceWorker#1206

Open

jakearchibald mentioned this issue Apr 4, 2018

Cross-Origin Read Blocking (CORB) #681

Closed

annevk mentioned this issue Apr 5, 2018

Document same-origin policy more centrally #691

Open

zcorpan mentioned this issue Dec 17, 2018

Video fails to work in Safari after reload SuperPWA/Super-Progressive-Web-Apps#77

Open

annevk mentioned this issue Mar 5, 2019

More CORB-protected MIME types - safelist-based approach #721

Open

sleevi mentioned this issue Jul 17, 2019

Specifying Range in Link preload header for HTTP/2 Push w3c/preload#139

Closed

This was referenced Jan 15, 2021

Specify how media elements make and validate range requests whatwg/html#2814

Closed

Improve logic for media elements annevk/orb#16

Merged

annevk added the topic: http label Jan 30, 2021

annevk mentioned this issue Feb 16, 2022

Report EventSource & track ResourceTiming entries whatwg/html#7553

Merged

3 tasks

Handling Partial Content / 206 #144

Handling Partial Content / 206 #144

Comments

mnot commented Oct 26, 2015

mnot commented Oct 27, 2015

annevk commented Nov 11, 2015

mnot commented Nov 12, 2015

mnot commented Nov 12, 2015

annevk commented Nov 12, 2015

code-tree commented Jun 3, 2016 • edited Loading

jakearchibald commented Jun 3, 2016

code-tree commented Jun 4, 2016

jakearchibald commented Jun 15, 2016

jakearchibald commented Jun 15, 2016

annevk commented Jun 20, 2016

jakearchibald commented Jun 20, 2016

mnot commented Mar 14, 2017

jakearchibald commented Mar 31, 2017

annevk commented Mar 31, 2017

jakearchibald commented Mar 31, 2017

annevk commented Mar 31, 2017

jakearchibald commented Jun 29, 2017

DanielBaulig commented Aug 18, 2017

jakearchibald commented Aug 21, 2017

DanielBaulig commented Aug 21, 2017 • edited Loading

annevk commented Aug 22, 2017

sirdarckcat commented Aug 22, 2017

jakearchibald commented Aug 22, 2017

jakearchibald commented Aug 22, 2017

DanielBaulig commented Oct 30, 2017

jakearchibald commented Feb 23, 2018 • edited Loading

Attack 1

Attack 2

Attack 3

Attack 4

jakearchibald commented Feb 26, 2018

annevk commented Feb 26, 2018 • edited Loading

jakearchibald commented Feb 26, 2018

jakearchibald commented Feb 28, 2018

ricea commented Apr 5, 2018

jakearchibald commented Apr 5, 2018

ricea commented Apr 5, 2018

noamr commented Feb 23, 2022

code-tree commented Jun 3, 2016 •

edited

Loading

DanielBaulig commented Aug 21, 2017 •

edited

Loading

jakearchibald commented Feb 23, 2018 •

edited

Loading

annevk commented Feb 26, 2018 •

edited

Loading