I tend to think that the reason people over-index on complex use-cases for LLMs is actually reliability, not a lack of interest in boring projects.
If an LLM can solve a complex problem 50% of the time, then that is still very valuable. But if you are writing a system of small LLMs doing small tasks, then even 1% error rates can compound into highly unreliable systems when stacked together.
The cost of LLMs occasionally giving you wrong answers is worth it for answers to harder tasks, in a way that it is not worth it for smaller tasks. For those smaller tasks, usually you can get much closer to 100% reliability, and more importantly much greater predictability, with hand-engineered code. This makes it much harder to find areas where small LLMs can add value for small boring tasks. Better auto-complete is the only real-world example I can think of.
Yeah. Is it even proven that LLMs don't hallucinate for smaller tasks? The author seems to imply that. I fail to see how it could be true.
>If an LLM can solve a complex problem 50% of the time, then that is still very valuable
I'd adjust that statement - If an LLM can solve a complex problem 50% of the time and I can evaluate correctness of the output, then that is still very valuable. I've seen too many people blindly pass on LLM output - for a short while it was a trend in the scientific literature to have LLMs evaluate output of other LLMs? Who knows how correct that was. Luckily that has ended.