Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement canonical URLs and redirects (if possible) #440

Open
zkamvar opened this issue Apr 13, 2023 · 3 comments
Open

Implement canonical URLs and redirects (if possible) #440

zkamvar opened this issue Apr 13, 2023 · 3 comments

Comments

@zkamvar
Copy link
Contributor

zkamvar commented Apr 13, 2023

Initially brought up in #43, but never actually moved beyond discussion are the idea of canonical URLs.

Basically, if someone wants to visit https://carpentries.github.io/sandpaper-docs/episodes.html, they can do so with two links:

but if they use https://carpentries.github.io/sandpaper-docs/episodes/, or https://carpentries.github.io/sandpaper-docs/episodes/index.html then they get a 404.

The reason for this is because the first two links point to a file, but the last two links point to a folder and analytics will see all of them as different unless we establish a canonical URL.

{pkgdown} has implemented redirects, but I am not sure how they will work for this because we want a redirect that exists inside of a folder with the same name as the file.

@bencomp
Copy link
Contributor

bencomp commented Jun 28, 2023

It is interesting that the URLs with .html and without both resolve, because I don't see two files when I build a lesson. Is that a GitHub thing?

Just noting that canonical URLs for the whole lesson came up in #481 as a building block to link episodes/chapters to the lesson in the metadata.

@zkamvar
Copy link
Contributor Author

zkamvar commented Jun 29, 2023

It is interesting that the URLs with .html and without both resolve, because I don't see two files when I build a lesson. Is that a GitHub thing?

I did not think about this, but yes, this is absolutely a GitHub thing and it runs into the boundaries of my knowledge of networking -_-

Take for example the beta phase preview of the lessons (deployed on AWS):

https://preview.carpentries.org/instructor-training/02-practice-learning.html (works)
https://preview.carpentries.org/instructor-training/02-practice-learning (fails)

$ curl -I https://preview.carpentries.org/instructor-training/02-practice-learning.html
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 62331
Connection: keep-alive
Date: Thu, 29 Jun 2023 13:27:47 GMT
Last-Modified: Tue, 27 Jun 2023 00:16:47 GMT
ETag: "2fab9dad8bdfa9df0a1753d25a4bb2cf"
Server: AmazonS3
Vary: Accept-Encoding
X-Cache: Miss from cloudfront
Via: 1.1 d6cbeccd9a6d25b691d204399bf8b728.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: SFO5-P2
X-Amz-Cf-Id: ftzMBTuQ4JvGacNZZ70dw3ZYTFJHJ0wyhmSI49x5uIiMKkhtgTi1ZQ==
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Referrer-Policy: strict-origin-when-cross-origin
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
Vary: Origin

$ curl -I https://preview.carpentries.org/instructor-training/02-practice-learning
HTTP/1.1 403 Forbidden
Connection: keep-alive
x-amz-error-code: AccessDenied
x-amz-error-message: Access Denied
Date: Thu, 29 Jun 2023 13:27:49 GMT
Server: AmazonS3
X-Cache: Error from cloudfront
Via: 1.1 94be61e339880d0097634de6934f7710.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: SFO5-P2
X-Amz-Cf-Id: zaSAKzoVYtmJPsIR2xmodwiUhDMtAhDlC5bzSMo8ixBR6iiWLPaDaA==
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Referrer-Policy: strict-origin-when-cross-origin
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=31536000
Vary: Origin

When I look at the pages on GitHub, there is no difference between the pages; not even a redirect:

$ curl -I https://carpentries.github.io/sandpaper-docs/episodes.html
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 90902
Server: GitHub.com
Content-Type: text/html; charset=utf-8
permissions-policy: interest-cohort=()
Last-Modified: Tue, 27 Jun 2023 00:26:06 GMT
Access-Control-Allow-Origin: *
ETag: "649a2c9e-16316"
expires: Thu, 29 Jun 2023 13:36:04 GMT
Cache-Control: max-age=600
x-proxy-cache: MISS
X-GitHub-Request-Id: 6B52:9B94:500FB1:5EF38F:649D866C
Accept-Ranges: bytes
Date: Thu, 29 Jun 2023 13:28:41 GMT
Via: 1.1 varnish
Age: 157
X-Served-By: cache-pdx12332-PDX
X-Cache: HIT
X-Cache-Hits: 1
X-Timer: S1688045321.324659,VS0,VE1
Vary: Accept-Encoding
X-Fastly-Request-ID: 9a03fb665121bdd4a2d53f703447089dce0becdc

$ curl -I https://carpentries.github.io/sandpaper-docs/episodes
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 90902
Server: GitHub.com
Content-Type: text/html; charset=utf-8
permissions-policy: interest-cohort=()
Last-Modified: Tue, 27 Jun 2023 00:26:06 GMT
Access-Control-Allow-Origin: *
ETag: "649a2c9e-16316"
expires: Thu, 29 Jun 2023 13:35:58 GMT
Cache-Control: max-age=600
x-proxy-cache: MISS
X-GitHub-Request-Id: 5FE4:84EB:50548F:5F37A8:649D8665
Accept-Ranges: bytes
Date: Thu, 29 Jun 2023 13:28:44 GMT
Via: 1.1 varnish
Age: 166
X-Served-By: cache-pdx12331-PDX
X-Cache: HIT
X-Cache-Hits: 1
X-Timer: S1688045324.075636,VS0,VE1
Vary: Accept-Encoding
X-Fastly-Request-ID: ff6c4cda35fb5b4299f55d7f949a79ecaad846f3
@bencomp
Copy link
Contributor

bencomp commented Jul 4, 2023

Thanks for doing this research. I feel that The Workbench should not rely on this GitHub feature and use the .html URLs as canonical. I noticed the variants while working on carpentries/lesson-development-training#209.

To signal which URL is canonical, you could (or perhaps should) use RFC 6596.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants