logoalt Hacker News

in_a_societytoday at 4:45 AM9 repliesview on HN

Smells like an article from someone that didn’t really USE the XML ecosystem.

First, there is modeling ambiguity, too many ways to represent the same data structure. Which means you can’t parse into native structs but instead into a heavy DOM object and it sucks to interact with it.

Then, schemas sound great, until you run into DTD, XSD, and RelaxNG. Relax only exists because XSD is pretty much incomprehensible.

Then let’s talk about entity escaping and CDATA. And how you break entire parsers because CDATA is a separate incantation on the DOM.

And in practice, XML is always over engineered. It’s the AbstractFactoryProxyBuilder of data formats. SOAP and WSDL are great examples of this, vs looking at a JSON response and simply understanding what it is.

I worked with XML and all the tooling around it for a long time. Zero interest in going back. It’s not the angle brackets or the serialization efficiency. It’s all of the above brain damage.


Replies

nine_ktoday at 6:01 AM

XML grew from SGML (like HTML did), and it brought from it a bunch of things that are useless outside a markup language. Attributes were a bad idea. Entities were a so-so idea, which became unapologetically terrible when URLs and file references were allowed. CDATA was an interesting idea but an error-prone one, and likely it just did not belong.

OTOH namespaces, XSD, XSLT were great, modulo the noisy tags. XSLT was the first purely functional language that enjoyed mass adoption in the industry. (It was also homoiconic, like Lisp, amenable to metaprogramming.) Namespaces were a lifesaver when multiple XML documents from different sources had to be combined. XPath was also quite nice for querying.

XML is noisy because of the closing tags, but it also guarantees a level of integrity, and LZ-type compressors, even gzip, are excellent at compacting repeated strings.

Importantly, XML is a relatively human-friendly format. It has comments, requires no quoting, no commas between list items, etc.

Complexity killed XML. JSON was stupid simple, and thus contained far fewer footguns, which was a very welcome change. It was devised as a serialization format, a bit human-hostile, but mapped ideally to bag-of-named-values structures found in basically any modern language.

Now we see XML tools adopted to JSON: JSONSchema, JSONPath, etc. JSON5 (as used in e.g. VSCode) allows for comments, trailing commas and other creature comforts. With tools like that, and dovetailing tools like Pydantic, XML lost any practical edge over JSON it might ever have.

What's missing is a widespread replacement for XSLT. Could be a fun project.

show 3 replies
tolcihotoday at 6:32 AM

And of course XML libraries haven't had any security issues (oh look CVE-2025-49796) and certainly would not need to make random network requests for a DTD of "reasonable" complexity. I also dropped XML, and that's after having a website that used XML, XSLT rendering to different output forms, etc. There were discussions at the time (early to mid 2000s) of moving all the config files on unix over to XML. Various softwares probably have the scars of that era and therefore an XML dependency and is that an embiggened attack surface? Also namespaces are super annoying, pretty sure I documented the ughsauce necessary to deal with them somewhere. Thankfully, crickets serenade the faint cries of "Bueller".

The contrast with only JSON is far too simplistic; XML got dropped from places where JSON is uninvolved, like why use a relational database when you can have an XML database??? Or those config files on unix are for the most part still not-XML and not-JSON. Or there's various flavors of markdown which do not give you the semi-mythical semantic web but can be banged out easily enough in vi or whatever and don't require schemas and validation or libraries with far too many security problems and I wouldn't write my documentation (these days) using S-expressions anyhow.

This being said there probably are places where something that validates strictly is optimal, maybe financial transactions (EDIFACT and XML are different hells, I guess), at least until some cheeky git points out that data can be leaked by encoding with tabs and spaces between the elements. Hopefully your fancy and expensive XML security layer normalizes or removes that whitespace?

pjmlptoday at 7:51 AM

I used it, and agree 100% with the author.

Hence why in 2026, I still hang around programming stacks, like Java and .NET, where XML tooling is great, instead of having to fight with YAML format errors, Norway error, or JSON without basic stuff like comments.

show 1 reply
ivan_gammeltoday at 6:34 AM

>First, there is modeling ambiguity, too many ways to represent the same data structure. Which means you can’t parse into native structs but instead into a heavy DOM object and it sucks to interact with it.

I don’t get this argument. There exist streaming APIs with convenient mapping. Yes, there can exist schemas with weird structure, but in practice they are uncommon. I have seen a lot of integration formats in XML, never had the need to parse to DOM first.

mkozlowstoday at 4:56 AM

The part where it favorably mentioned namespaces also blew my mind. Namespaces were a constant pain point!

show 3 replies
wvenabletoday at 6:05 AM

I read the article and my first thought was it was entirely missing the complexity of XML. It started out relatively simple and easy to understand and most people/programs wrote simple XML that looked a lot like HTML still does.

But it didn't take long before XML might well be a binary format for all it matters to us humans looking at it, parsing it, dealing with it.

JSON came along and it's simplicity was baked in. Anyone can argue it's not a great format but it forcefully maintains the simplicity that XML lost quite quickly.

bornfreddytoday at 5:56 AM

You managed to convey my thoughts exactly, and you only used term "SOAP" once. Kudos!

SOAP was terrible everywhere, not just in Nigeria as OP insinuates. And while the idea of XML sounds good, the tools that developed on top of it were mostly atrocious. Good riddance.

locknitpickertoday at 5:58 AM

> I worked with XML and all the tooling around it for a long time. Zero interest in going back. It’s not the angle brackets or the serialization efficiency. It’s all of the above brain damage.

I remember a decade ago seeing job ads that explicitly requested XML skills. The fact that being able to do something with XML was considered a full time job requiring a specialist says everything there is to be said about XML.

show 1 reply