Hard limit:
- Plenty of API hallucination happening on cutting edge Spark (4.0.0+) functionality, especially PySpark. Spark bares some blame here for broken and incomplete documentation. Takes a human in the loop to realize that the documentation is misleading or wrong or missing.
Soft limit: - API design. I’ve found that, unless specifically steered towards “good” API design (highly subjective), agents tend to just add another endpoint / function to satisfy the exact task at hand, with total disregard to how the rest of the API looks. (Pretty much exactly what a junior engineer would do…)That last part about it acting like a junior matches my experience very well. I'm using LLM's for refactoring, adding repetitive blocks of code, etc.
Unless I'm very clear at all times it will write code like the most annoying stubborn junior you've ever worked with. Nothing is sacred, everything can be abbreviated, shortened, made more confusing, made less readable, and concepts like readability or naming conventions are not even considered.
It also adds superfluous nonsense comments that don't explain the "why".
On this note, one thing I've found Codex to do is worry more than necessary about breaking changes for internal APIs. Maybe a bit more prompting would fix this, but I found even when iteratively implementing larger new features, it worries about breaking APIs that aren't used by anything but the new code yet.