Isn't this in contradiction to your blog post from yesterday though? It's impossible to prove a complex project made in 4.5 hours works. It might have passed 9000 tests, but surely there are always going to be edge cases. I personally wouldn't be comfortable claiming I've proved it works and saying the job is done even, if the LLM did the whole thing and all existing tests passed, until I played with it for several months. And even then I would assume I would need to rely on bug reports coming in because it's running on lots of different systems. I honestly don't know if software is ever really finished.
My takeaway from your blog post yesterday was that with a robust enough testing system the LLM can do the entire thing while I do Christmas with the family.
(Before all the AI fans come in here. I'm not criticizing AI.)
Consider that this isn't just a random AI slopped assortment of 9,000 tests, but instead is a robust suite of tests that cover 100% of the HTML5 spec.
Does this guarantee that it functions completely with no errors whatsoever? Certainly not. You need formal verification for that. I don't think that contradicts what Simon was advocating for though in this post.
That's why I don't consider my blog post from yesterday to be production quality code. I'd need to invest a lot more work in reviewing it before I staked my reputation on it