logoalt Hacker News

WhyOhWhyQyesterday at 5:51 PM2 repliesview on HN

Isn't this in contradiction to your blog post from yesterday though? It's impossible to prove a complex project made in 4.5 hours works. It might have passed 9000 tests, but surely there are always going to be edge cases. I personally wouldn't be comfortable claiming I've proved it works and saying the job is done even, if the LLM did the whole thing and all existing tests passed, until I played with it for several months. And even then I would assume I would need to rely on bug reports coming in because it's running on lots of different systems. I honestly don't know if software is ever really finished.

My takeaway from your blog post yesterday was that with a robust enough testing system the LLM can do the entire thing while I do Christmas with the family.

(Before all the AI fans come in here. I'm not criticizing AI.)


Replies

simonwyesterday at 6:22 PM

That's why I don't consider my blog post from yesterday to be production quality code. I'd need to invest a lot more work in reviewing it before I staked my reputation on it

BeefySwainyesterday at 6:01 PM

Consider that this isn't just a random AI slopped assortment of 9,000 tests, but instead is a robust suite of tests that cover 100% of the HTML5 spec.

Does this guarantee that it functions completely with no errors whatsoever? Certainly not. You need formal verification for that. I don't think that contradicts what Simon was advocating for though in this post.

show 1 reply