The author really needs to start with that. They say "the API that we are building" and as...

mubou2 • last Wednesday at 5:32 PM • 5 replies • view on HN

The author really needs to start with that. They say "the API that we are building" and assume I know who they are and what they're working on, all the way until the very bottom. I just assumed it's some open source library.

> HTML parsing is not stable and a line of HTML being parsed and serialized and parsed again may turn into something rather different

Are there any examples where the first approach (sanitize to string and set inner html) is actually dangerous? Because it's pretty much the only thing you can do when sanitizing server-side, which we do a lot.

Edit: I also wonder how one would add for example rel="nofollow noreferrer" to links using this. Some sanitizers have a "post process node" visitor function for this purpose (it already has to traverse the dom tree anyway).

Replies

crote • last Wednesday at 5:55 PM

> Are there any examples where the first approach (sanitize to string and set inner html) is actually dangerous?

The article links to [0], which has some examples of instances in which HTML parsing is context-sensitive. The exact same string being put into a <div> might be totally fine, while putting it inside a <style> results in XSS.

[0]: https://www.sonarsource.com/blog/mxss-the-vulnerability-hidi...

tobr • last Wednesday at 5:43 PM

> They say "the API that we are building" and assume I know who they are and what they're working on, all the way until the very bottom.

This is a common and rather tiresome critique of all kinds of blog posts. I think it is fair to assume the reader has a bit of contextual awareness when you publish on your personal blog. Yes, you were linked to it from a place without that context, but it’s readily available on the page, not a secret.

➕ show 1 reply

LegionMammal978 • last Wednesday at 5:49 PM

They had a link in their post [0]: it seems like most of the examples are with HTML elements with wacky contextual parsing semantics such as <svg> or <noscript>. Their recommendation for server-side sanitization is "don't, lol", and they don't offer much advice regarding it.

Personally, my recommendation in most cases would be "maintain a strict list of common elements/attributes to allow in the serialized form, and don't put anything weird in that list: if a serialize-parse roundtrip has the remote possibility of breaking something, then you're allowing too much". Also, "if you want to mutate something, then do it in the object tree, not in the serialized version".

[0] https://www.sonarsource.com/blog/mxss-the-vulnerability-hidi...

➕ show 2 replies

rebane2001 • last Thursday at 7:35 AM

> Because it's pretty much the only thing you can do when sanitizing server-side

I'd suggest not sanitizing user-provided HTML on the server. It's totally fine to do if you're fully sanitizing it, but gets a little sketchy when you want to keep certain elements and attributes.

masklinn • last Wednesday at 6:29 PM

> Are there any examples where the first approach (sanitize to string and set inner html) is actually dangerous?

The term to look for is “mutation xss” (or mxss).

alt Hacker News

Replies