logoalt Hacker News

the_harpia_iotoday at 3:02 PM1 replyview on HN

yeah I think that's exactly the disconnect - they're optimizing for a future where agents can actually be trusted to run autonomously, but we're not there yet. like the reliability just isn't good enough to justify hiding what it's doing. and honestly I'm not sure we'll get there by making the UX worse for humans who are actively supervising, because that's how you catch the edge cases that training data misses. idk, feels like they're solving tomorrow's problem while making today's harder


Replies

jddjtoday at 3:52 PM

Agreed. If the next token could be trusted, we wouldn't need all these passes and hidden work within the harness.

Converging on an answer eventually is less interesting when you pay by the token and they start by, real opus 4.6 example from yesterday, adding a column to a database by editing an existing migration file.[1]

We're beyond a million monkeys with typewriters, but they can still be made out very clearly in the rear-view mirror.

[1] what would have happened here if I hadn't aborted this? Would it have cleaned it up? Who knows.