As a counter example (re: agents), I routinely delegate simple tasks to Claude Code and get near-perfect results. But I've also had experiences like yours where I ended up wasting more time than saved. I just kept trying with different types of tasks, and narrowed it down to the point where I have a good intuition for what works and what doesn't. The benefit is I can fire off a request on my phone, stick it in my pocket, then do a code review some time later. This process is very low mental overhead for me, so it's a big productivity win.
Thats cool, how are you integrating your phone with your Claude workflow?
The cost is in the context switching. Throw 3 tasks that came 15, 20 and 30 min later. The first is mostly ok, you finish by hand. The second have some problems, ask for a rework. Then came the other and, while ok, is have some design problems. Ask another rework. Comes back the second one, and you have to remember the original task and what things you asked for change.
What are you using to fire up requests on your phone?
Sounds like a slot machine. Insert api tokens, get something that's pretty close to right, insert more tokens and hope it works this time.