I've already written about this several times here. I think the current trend of LLMs chasing benchmark scores are going in the wrong direction at least as programming tools. In my experience they get it wrong with enough probability, so I always need to check the work. So I end up in a back and forth with the LLM and because of the slow responses it becomes a really painful process and I could often have done the task faster if I sat down and thought about it. What I want is an agent that responds immediately (and I mean in subseconds) even if some benchmark score is 60% instead of 80%.
World of LLMs or not, development should always strive for being fast. In the LLM World, users should always have the controls on accuracy Vs speed. (Though we can try for improving both and not one way or other). For eg at rtrvr.ai we use Gemini Flash as our default and did benchmarking on flash too with 0.9 min per task in the benchmark still yielding top results. That said, I have to accept there are certain web tasks on tail end sites that needs pro to accurately navigate at this point. This is the limitation given our reliance on Gemini models straight up, once we move to our models trained on web trajectories this hopefully will not be a problem.
If using off the shelf LLMs always have a bottleneck of their speed.
GitHub copilot's inline completions still exist, and are nearly instant!
Programmers (and I'm including myself here) often go to great lengths to not think, to the point of working (with or without a coding assistant) for hours in the hope of avoiding one hour of thinking. What's the saying? "An hour of debugging/programming can save you minutes of thinking," or something like that. In the end, we usually find that we need to do the thinking after all.
I think coding assistants would end up being more helpful if, instead of trying to do what they're asked, they would come back with questions that help us (or force us) to think. I wonder if a context prompt that says, "when I ask you to do something, assume I haven't thought the problem through, and before doing anything, ask me leading questions," would help.
I think Leslie Lamport once said that the biggest resistance to using TLA+ - a language that helps you, and forces you to think - is because that's the last thing programmers want to do.