The gulf is bridgeable. The problem is that a lot of people are building agents without strong enough judgment layers around them. Work that can be verified with reasonable accuracy are the sweet spot right now.
How many of these layers are just trying to rediscover/rebuild the idempotence of code?
> The gulf is bridgeable.
Only with an LLM that's actually at agent-quality.
If "useful chatbot" and "useful agent" are two rungs on a ladder, the rung before them is "useful autocomplete". Autocomplete that only gets the next token right 90% of the time won't give you compiling code.