logoalt Hacker News

wilkystyleyesterday at 8:49 PM4 repliesview on HN

I have personally found that I cannot context switch between thinking deeply about two separate problems and workstreams without a significant cognitive context-switching cost. If it's context-switching between things that don't require super-deep thought, it's definitely doable, but I'm still way more mentally burnt-out after an hour or two of essentially speed-running review of small PRs from a bunch of different sources.

Curious to know more about your work:

Are your agents working on tangential problems? If so, how do you ensure you're still thinking at a sufficient level of depth and capacity about each problem each agent is working on?

Or are they working on different threads of the same problem? If so, how do you keep them from stepping on each other's toes? People mention git worktrees, but that doesn't solve the conflict problem for multiple agents touching the same areas of functionality (i.e. you just move the conflict problem to the PR merge stage)


Replies

simplylukeyesterday at 9:23 PM

This is a struggle I've also been having.

It's easier when I have 10 simple problems as a part of one larger initiative/project. Think like "we had these 10 minor bugs/tweaks we wanted to make after a demo review". I can keep that straight. A bunch of agents working in parallel makes me notably faster there though actually reviewing all the output is still the bottleneck.

It's basically impossible when I'm working on multiple separate tasks that each require a lot of mental context. Two separate projects/products my team owns, two really hard technical problems, etc. This has been true before and after AI - big mental context switches are really expensive and people can't multitask despite how good we are at convincing ourselves we can.

I expect a lot of folks experience here depends heavily on how much of their work is the former vs the later. I also expect that there's a lot of feeling busy while not actually moving much faster.

show 2 replies
skippyboxedheroyesterday at 11:03 PM

Yes, also doesn't work for me. If the changes are simple, it is fine but if the changes are complex and there isn't a clear guideline then there is no AI that is good enough or even close to it. Gives you a few days of feeling productive and then weeks of trying to tidy up the mess.

Also, I have noticed, strangely, that Claude is noticeably less compliant than GPT. If you ask a question, it will answer and then try to immediately make changes (which may not be related). If you say something isn't working, it will challenge you and it was tested (it wasn't). For a company that is seems to focus so much on ethics, they have produced an LLM that displays a clear disregard for users (perhaps that isn't a surprise). Either way, it is a very bad model for "agent swarm" style coding. I have been through this extensively but it will write bad code that doesn't work in a subtle way, it will tell that it works and that the issues relate to the way you are using the program, and then it will do the same thing five minutes later.

The tooling in this area is very good. The problem is that the AI cannot be trusted to write complex code. Imo, the future is something like Cerbaras Code that offers a speed up for single-threaded work. In most cases, I am just being lazy...I know what I want to write, I don't need the AI to do it, and I am seeing that I am faster if I just single-thread it.

Only counterpoint to this is that swarms are good for long-running admin, housekeeping, etc. Nowhere near what has been promised but not terrible.

jwpapiyesterday at 10:49 PM

I tried swarms as well, but I came back to you as well. It’s not worth it even th e small worse description double-checking, fine-tuning is not worth the effort the worse code will cost me in the future. Also when I don’t know about it.

nprateemyesterday at 9:13 PM

It's not that difficult. You get it to work on one deep problem, then another does more trivial bug fixes/optimizations, etc. Maybe in another you're architecting the next complex feature, another fixes tests, etc etc