now sometimes that's 4 hours, but I've had plenty of times where I'm "racing" people using LLMs and I basically get the coding done before them. Once I debugged an issue before the robot was done `ls`-ing the codebase!
The shape of the problem is super important in considering the results here
You have the upper hand with familiarity of the code base. Any "domain expert" also necessarily has a head start knowing which parts of a bespoke complex system need adjustment when making changes.
On the other hand, a highly skilled worker who just joined the team won't have any of that tribal knowledge. There is a significant lag time getting ramped up, no matter how intelligent they are due to sheer scale (and complexity doesn't help).
A general purpose model is more like the latter than the former. It would be interesting to compare how a model fine tuned on the specific shape of your code base and problem domain performs.