logoalt Hacker News

DeathArrowtoday at 5:18 AM2 repliesview on HN

To me this sounds like an old cobbler complaining that machines aren't producing good shoes if left unsupervised and that the old process of making shoes completely by hand is far superior.

So what he is telling us? That agents are not infaillable and they are not capable to one shot complex software and they do not produce perfect code?

We know what and the solution is to use agents for what they are good at and work around their limitations and we have a human in the loop.

>not some RLVR shit that comments out the failing test and tells you all the tests are now passing

That's what harnesses should be about: detect when the agent is misbehaving and force it to take the right approach.

This example in particular should be easy to solve if we generated the tests before coding and we have a workflow or state machine that doesn't allow the agent to disable tests and doesn't allow it to reach the next stage unless all tests are passing.


Replies

Alex_L_Woodtoday at 5:37 AM

LLM proponents always use some language like "these old, stuck up dinosaurs with their manual labor vs us cool smart kids with automated labor", but they forget one thing - with automated labor the performance and cost difference was easily measurable in favor of the automation. With LLMs it's neither measurable nor visible (no better software, no faster delivery overall in the industry), and the costs are pretty bad. Besides personal anecdotes of someone toiling away at yet another AI harness project on GitHub.

show 1 reply
chrisco255today at 9:11 AM

Modern shoes are made by a mix of machine and hand. There is still quite a bit of manual labor to produce shoes: https://www.youtube.com/watch?v=bK8pcAYapXQ

This is effectively what's happening to software. We are getting some forms of automation but I believe there's plenty of manual work and coordination left for humans to do.