logoalt Hacker News

TazeTSchnitzeltoday at 12:50 PM6 repliesview on HN

> The specification must contain a non-ambiguous formal grammar that can be parsed easily. A page can then be tested against the standard and reject or accept as compliant. Pages that don't conform with the specification won't be rendered. It is explicitly forbidden for clients to accept any page that doesn't conform with the specification.

This is what XHTML was, and it was a complete disaster. There's a reason almost nobody serves XHTML with the application/xhtml+xml MIME type, and that reason is that getting a “parser error” (this is what browsers still do! try it!) is always worse than getting a page that 99% works.[0] I strongly believe that rejecting the robustness principle is a fatal mistake for a web-replacement project. The fact that horribly broken old sites can stay online and stay readable is a huge part of the web's value. Without that, it's not really “the web”, spiritually or otherwise.

[0] It's particularly “cool” how they simply do not work in the Internet Archive's Wayback machine. The page can be retrieved, but nobody can read it.


Replies

fooquxtoday at 1:06 PM

Agreed. There may be some situations where I may want to ensure 100% correctness. I'm thinking life or death scenarios, (which if so, maybe should use a different protocol). However, checking the sports score or looking at cat memes isn't that.

singpolyma3today at 1:21 PM

To be fair, HTML5 also has a defined parsing algorithm. It just happens to always work on any input to produce a webpage

show 3 replies
maxericksontoday at 1:01 PM

No scripting is a tell, it's about wanting other people to accommodate their concerns about running a complex browser, not about solving a real problem.

If it did somehow happen that a good deal of interesting content was published using the standard, the most popular client would probably be nonconforming, ignoring the rule to not render ambiguous content.

show 1 reply
rodarimatoday at 2:18 PM

Author here. I agree that you cannot go from HTML to XHTML because users and UA devs will always go towards "it mostly works".

However, I don't see it that clearly that this cannot be done since the start so that the expectations are right since the beginning. For example, I don't see the same problem in other formats like JPEG or PNG where you expect the image to work perfectly or fail with a decoding error.

Other than implementing it and see how it goes, can you propose a feasible experiment to see how an new strict spec will measurably fail?

show 2 replies
TFNAtoday at 1:00 PM

XHTML failed in an era when writers (even normies) were writing some HTML of their own and they could't be trusted to close their tags properly. XHTML also assumed writers would be personally invested in semantic markup like distinguishing e.g. the italics of book titles from the italics of emphasis.

Today, when writers are using visual editors (or Markdown), few are writing their own HTML any more. A web standard requiring compliance would work differently today.

show 2 replies
chrismorgantoday at 5:19 PM

> There's a reason almost nobody serves XHTML with the application/xhtml+xml MIME type, and that reason is that getting a “parser error” (this is what browsers still do! try it!) is always worse than getting a page that 99% works.

That’s not the reason almost nobody serves XHTML.

The real reason is Internet Explorer. Okay, it’s a little more nuanced than that, but I think it’s accurate enough. Microsoft killed XHTML by inaction.

It’s 2004. XHTML is now a few years old, and all the rage. You decide to use it for your new project which you’re developing. At the start, you serve pages as application/xhtml+xml, and that works well in Firefox; but you know that won’t work because Internet Explorer still doesn’t support XHTML, and 90% of your viewers will be using that. So, a little frustrated, you serve your nice XHTML as text/html. You still validate it manually for a while, but then that habit disappears. Eventually you make one or two small mistakes that would have been caught easily if it were parsed as XML—but it’s not, because of Internet Explorer. Over time this disparity grows.

People have been complaining of the inefficacy of XHTML for this exact reason for two or three years by this point.

It’s 2006. XHTML is acknowledged to have failed. Everything else supports it, but as long as IE doesn’t, you can’t serve as application/xhtml+xml, and so you can’t get the advantages of XML syntax.

Seriously, early failure is good—so long as you’re working with it from the start. The problems only occur when you try to add strictness later.

Just look at typing in code bases. Adding strictness to existing JavaScript or Python or Ruby? Nightmare. Starting with static types? Somewhere between fine and extremely desirable.

(I might be overselling strictness’s popularity at the time—people don’t always like what’s good for them. We’ve largely realised now that unfettered dynamic typing is a bad idea, but ten years ago that was not settled. People get used to things. If IE had permitted XHTML early on, people would have got used to the idea of XHTML’s strictness and, I think, got to mostly like it.)

XHTML did not fail because of XML’s catastrophic parse failure mode. It failed because HTML already worked, and Internet Explorer took way too losng to accept XHTML. If you’re forking the web and compatibility with existing documents is not a goal, you can’t use XHTML’s failure as an argument: it failed because of compatibility issues.

Well, Internet Explorer did eventually support application/xhtml+xml: in 2011, IE9. Way too late to matter. And so only by around 2015 or 2016 could you finally serve with XML syntax. And now why would you? For your system is big and has tiny errors here and there and your CMS just drops markup in and never got round to validating it and and and and so on. By that time, HTML had given up on the XML path, and although it worked, the momentum was entirely gone, so you’d run into difficulties due to inadequate documentation, inferior tooling (ironic), and various more.