logoalt Hacker News

feryesterday at 8:08 PM1 replyview on HN

Or simply we use AI and see on the ground what it can and can't do. I can generally trust an agent for solved problems, but the more something deviates from established industry standards (i.e. what was relentlessly scraped) I have an increasingly harder time not having constant oversight of what it's doing, no matter the specs I put on the md.

Personally I feel most of the improvement in the last year comes from tooling/integration (MCPs, realtime documentation access, treesitter support, orchestration) than from the models themselves, in the last year. And still frontier models would routinely come up with bs until you tell them to actually use those tools.


Replies

TobyTheCamelyesterday at 8:17 PM

You're talking as if this is a static thing though. It's the God of Gaps [1] but for humanity's special sauce.

Two years ago, I couldn't trust an LLM to do anything that wasn't straight forward boiler plate.

One year ago, I was pretty solid at writing algorithms that were combinations of existing ideas.

Now, Fable is outputting stuff that I would genuinely consider to be creative and original if a colleague had presented it to me.

Yes, maybe the code style still isn't great, but given the pattern of the last few years, it feels correct (a priori) to assume that this gap isn't going to keep closing.

[1] https://en.wikipedia.org/wiki/God_of_the_gaps