Yeah I’m having a similar experience. I’ve been wanting a standard test suite for JMAP email servers, so we can make sure all created jmap servers implement the (somewhat complex) spec in a consistent manner. I spent a single day prompting Claude code on Friday, and walked away with about 9000 lines of code, containing 300 unit tests for jmap servers. And a web interface showing the results. It would have taken me at least a week or two to make something similar by hand.
There’s some quality issues - I think some of the tests are slightly wrong. We went back and forth on some ambiguities Claude found in the spec, and how we should actually interpret what the jmap spec is asking. But after just a day, it’s nearly there. And it’s already very useful to see where existing implementations diverge on their output, even if the tests are sometimes not correctly identifying which implementation is wrong. Some of the test failures are 100% correct - it found real bugs in production implementations.
Using an AI to do weeks of work in a single day is the biggest change in what software development looks like that I’ve seen in my 30+ year career. I don’t know why I would hire a junior developer to write code any more. (But I would hire someone who was smart enough to wrangle the AI). I just don’t know how long “ai prompter” will remain a valuable skill. The AIs are getting much better at operating independently. It won’t be long before us humans aren’t needed to babysit them.