A few months ago I wanted to revisit some of my old Github Pages sites, so I made a thing to do that: github.com/kashav/foo. The goal was to make it possible to view any revision of a static site without having to leave the browser.
There’s three main steps to handling each request: checking out a revision, building the code at that revision, and then serving the site that was built. Here’s how it works in its current state:
- On startup, the program creates a sparse clone of the repo, and starts listening for HTTP requests.
- When a request comes in, it creates a new git worktree using the provided revision (passed as standard URL query param), and builds that checkout.
- It then responds to the request with the assets that were just built.
This brings us to problem #1: resource request URIs don’t include query parameters.
Consider the following example, assuming that foo is hosted at foo.com:
<!DOCTYPE html> <!-- git SHA: eieio1.... -->
<head>
...
<link rel="stylesheet" href="/farm.css">
</head>
When a request comes in at foo.com?rev=eieio1
, foo correctly responds with the
above html, but then the browser attemps to load css from
foo.com/farm.css
, which fails because foo can’t find farm.css
without the
rev
parameter.
Fortunately, in this case, the Referer header is set to the URI of the initial request (since that’s the document that’s making this request!), so foo can get query parameters from that instead, and then figure out which resource to respond with:
GET /farm.css HTTP/1.1
Referer: http://foo.com/?rev=eieio1
...
But that brings us to problem #2: what happens when the request is being made by a subdocument?
Like when an image is loaded via css:
GET /cow.png HTTP/1.1
Referer: http://foo.com/farm.css
...
I’m not sure yet. Continued soon.