logoalt Hacker News

domenicdtoday at 10:41 AM0 repliesview on HN

As some others have indirectly pointed out, this article conflates two things:

- URL parsing/normalization; and

- Mapping URLs to resources (e.g. file paths or database entries) to be served from the server, and whether you ever map two distinct URLs to the same resource (either via redirects or just serving the same content).

The former has a good spec these days: https://url.spec.whatwg.org/ tells you precisely how to turn a string (e.g., sent over the network via HTTP requests) into a normalized data structure [1] of (scheme, username, password, host, port, path, query, fragment). The article is correct insofar that the spec's path (which is a list of strings, for HTTP URLs) can contain empty string segments.

But the latter is much more wild-west, and I don't know of any attempt being made to standardize it. There are tons of possible choices you can make here:

- Should `https://example.com/foo//bar` serve the same resource as `https://example.com/foo/bar`? (What the article focuses on.)

- `https://example.com/foo/` vs. `https://example.com/foo`

- `https://example.com/foo/` vs. `https://example.com/FOO`

- `https://example.com/foo` vs. `https://example.com/fo%6f%` vs. `https://example.com/fo%6F%`

- `https://example.com/foo%2Fbar` vs. `https://example.com/foo/bar`

- `https://example.com/foo/` vs. `https://example.com/foo.html`

Note that some things are normalized during parsing, e.g. `/foo\bar` -> `/foo/bar`, and `/foo/baz/../bar` -> `/foo/bar`. But for paths, very few.

Relatedly:

- For hosts, many more things are normalized during parsing. (This makes some sense, for security reasons.)

- For query, very little is normalized during parsing. But unlike for pathname, there is a standardized format and parser, application/x-www-form-urlencoded [2], that can be used to go further and canonicalize from the raw query string into a list of (name, value) string pairs.

Some discussions on the topic of path normalization, especially in terms of mapping the filesystem, in the URL Standard repo:

- https://github.com/whatwg/url/issues/552

- https://github.com/whatwg/url/issues/606

- https://github.com/whatwg/url/issues/565

- https://github.com/whatwg/url/issues/729

-----

[1]: https://url.spec.whatwg.org/#url-representation [2]: https://url.spec.whatwg.org/#application/x-www-form-urlencod...