logoalt Hacker News

veselinyesterday at 3:43 PM1 replyview on HN

I am taking for SWE bench style problems where Todo doesn't help, except for more parallelism.


Replies

lmeyerovyesterday at 8:43 PM

Was guessing that, coding tasks are a valuable but myopic lense :)

I'm guessing a self-updating plan there is sufficient. I'm not actually convinced today's current plan <> todolist flow makes sense - in the linked PLAN.md, it gets unified, and that's how we do ai coding. I don't have evals on this, but from a year of vibes coding/engineering, that's what we experientially reached across frontier coding models & tools. Nowadays we're mixing in evals too, but that's a more complicated story.