Maybe I’m naive but the longest single workflow I ran was maybe 15 minutes. How do you steer agents ...

__natty__ • today at 4:14 PM • 6 replies • view on HN

Maybe I’m naive but the longest single workflow I ran was maybe 15 minutes. How do you steer agents to run “overnight”? And what is the quality of such execution?

Replies

Bnjoroge • today at 5:58 PM

Works well for very well defined task. If you have a really big feature like a front end migration, you can use /plan, and /goal which i think is in most harnesses. You can also use other tools that allow your agent to interact with other terminals(I use an ADE called orca) that has an orca skill where an agent can spin up different sessions(different from subtasks because they share the context and you can chose the harness/model unlike sub agents). Can also read from the terminal, use your browser or computer and task screenshots and after prepare a report or something.

dregitsky • today at 5:48 PM

To add to what @nab said, the longest ("overnight") runs are usually after going back and forth to build out a big multi-phase plan doc -- especially when each phase has an extensive manual test plan (agent runs the app in a browser, clicks through the workflow, watches logs, confirms behavior, etc).

These can go for many hours from all the manual testing and debugging. Quality really depends on how much you spec things out beforehand, and how you define the test plan / "success" gates. If the agent can't even run the app to test it then things can definitely go off the rails!

notrealyme123 • today at 4:17 PM

Usually coding where the closed loop evaluation takes time.

E.g code debugging

➕ show 1 reply

ai_slop_hater • today at 5:15 PM

I think they are just bullshitting.

FergusArgyll • today at 5:12 PM

In codex, is you use /goal it can go for a while. I've never seen overnight but > 1 hr is common

smrtinsert • today at 5:43 PM

"build me a 10 million dollar MRR saas, make no mistakes"

alt Hacker News

Replies