logoalt Hacker News

__natty__today at 4:14 PM6 repliesview on HN

Maybe I’m naive but the longest single workflow I ran was maybe 15 minutes. How do you steer agents to run “overnight”? And what is the quality of such execution?


Replies

Bnjorogetoday at 5:58 PM

Works well for very well defined task. If you have a really big feature like a front end migration, you can use /plan, and /goal which i think is in most harnesses. You can also use other tools that allow your agent to interact with other terminals(I use an ADE called orca) that has an orca skill where an agent can spin up different sessions(different from subtasks because they share the context and you can chose the harness/model unlike sub agents). Can also read from the terminal, use your browser or computer and task screenshots and after prepare a report or something.

dregitskytoday at 5:48 PM

To add to what @nab said, the longest ("overnight") runs are usually after going back and forth to build out a big multi-phase plan doc -- especially when each phase has an extensive manual test plan (agent runs the app in a browser, clicks through the workflow, watches logs, confirms behavior, etc).

These can go for many hours from all the manual testing and debugging. Quality really depends on how much you spec things out beforehand, and how you define the test plan / "success" gates. If the agent can't even run the app to test it then things can definitely go off the rails!

notrealyme123today at 4:17 PM

Usually coding where the closed loop evaluation takes time.

E.g code debugging

show 1 reply
ai_slop_hatertoday at 5:15 PM

I think they are just bullshitting.

FergusArgylltoday at 5:12 PM

In codex, is you use /goal it can go for a while. I've never seen overnight but > 1 hr is common

smrtinserttoday at 5:43 PM

"build me a 10 million dollar MRR saas, make no mistakes"