logoalt Hacker News

nullcyesterday at 9:05 PM0 repliesview on HN

Sorry to be a debbie downer, but this reads like LLM slop rather than engineering work. I don't just mean the language on the page-- although that too (not an X it's a Y, over and over again)-- but the absence of the artifacts of ActualEngineering(tm) rather than just a flood of vibes.

For example, I would expect to see tables or figures showing task success rates on some benchmarks for agents augmented with and without this proposal, perhaps before and after fine tuning, or running against alternatives or to the extent that there are no alternatives against variations of this design that were considered and rejected.

Otherwise what reason is there to think that this design is better than some alternative or even any good at all? Perhaps it causes agents to hallucinate like crazy-- who knows if it hasn't been tested.

Work like that is what makes efforts like this worth sharing and worth reading about-- anyone can spend a few minutes and ask their favorite LLM to design such a framework and get something that looks "credible". But in a post LLM world credible alone is externally indistinguishable from anti-social time wasting slop.