> HTML parsing is not stable and a line of HTML being parsed and serialized and parsed again may ...

nayuki • last Wednesday at 5:39 PM • 3 replies • view on HN

> HTML parsing is not stable and a line of HTML being parsed and serialized and parsed again may turn into something rather different

This is why people should really use XHTML, the strict XML dialect of HTML, in order to avoid these nasty parsing surprises. It has the predictable behavior that you want.

In XHTML, the code does exactly what it says it does. If you write <table><a></a></table> like the example on the mXSS page, then you get a table element and an anchor child. As another example, if you write <table><td>xyz</td></table>, that's exactly what you get, and there are no implicit <tbody> or <tr> inserted inside.

It's just wild as I continue to watch the world double down for decades on HTML and all its wild behavior in parsing. Furthermore, HTML's syntax is a unique snowflake, whereas XML is a standardized language that just so happens to be used in SVG, MathML, Atom, and other standards - no need to relearn syntax every single time.

Replies

bayesnet • last Wednesday at 6:09 PM

I don’t think this is right. XHTML guarantees well-formedness (matched closing tags et al) but doesn’t do anything for validity. It’s not semantically valid for <td> to be a direct child of <table>, so the user agent has to make the call as to what to display regardless of the (X)HTML flavor. The alternative is parsing failure on improperly nested HTML which I don’t think is desirable.

➕ show 1 reply

favorited • last Wednesday at 9:38 PM

You might as well complain about Betamax. XHTML is not the future.

recursive • last Wednesday at 9:48 PM

HTML is also a standardized language.

alt Hacker News

Replies