Software developers have spent decades at this point discounting and ignoring almost all objective metrics for software quality and the industry as a whole has developed a general disregard for any metric that isn't time-to-ship (and even there they will ignore faster alternatives in favor of hyped choices).
(Edit: Yes, I'm aware a lot of people care about FP, "Clean Code", etc., but these are all red herrings that don't actually have anything to do with quality. At best they are guidelines for less experienced programmers and at worst a massive waste of time if you use more than one or two suggestions from their collection of ideas.)
Most of the industry couldn't use objective metrics for code quality and the quality of the artifacts they produce without also abandoning their entire software stack because of the results. They're using the only metric they've ever cared about; time-to-ship. The results are just a sped up version of what we've had now for more than two decades: Software is getting slower, buggier and less usable.
If you don't have a good regulating function for what represents real quality you can't really expect systems that just pump out code to actually iterate very well on anything. There are very few forcing functions to use to produce high quality results though iteration.
This doesn't pass a sniff test. We have plenty of ways to verify good software, else you wouldn't be making this post. You know what bad software is and looks like. We want something fast that doesn't throw an error every 3 page navigations.
You can ask an LLM to make code in whatever language you want. And it can be pretty good at writing efficient code, too. Nothing about NPM bloat is keeping you from making a lean website. And AI could theoretically be great at testing all parts of a website, benchmarking speeds, trying different viewports etc.
But unfortunately we are still on the LLM train. It just doesn't have anything built-in to do what we do, which is use an app and intuitively understand "oh this is shit." And even if you could allow your LLM to click through the site, it would be shit at matching visual problems to actual code. You can forget about LLMs for true frontend work for a few years.
And they are just increasingly worse with more context, so any non-trivial application is going to lead to a lot of strange broken artifacts, because text prediction isn't great when you have numerous hidden rules in your application.
So as much as I like a good laugh at failing software, I don't think you can blame shippers for this one. LLMs are not struggling in software development because they are averaging a lot of crap code, it's because we have not gotten them past unit tests and verifying output in the terminal yet.
But we don't even seem to be getting faster time-to-ship in any way that anybody can actually measure; it's always some vague sense of "we're so much more productive".