I didn't set out to teach you anything, change your behavior, or give you practical takeaways, so it's a rant (: Emotions can be expressed with citations.
I am fully on board with gen AI representing a paradigm shift in software development. I tried to be careful not to take a stance on other debates in the larger conversation. I just saw too many people talking about how much code they're generating as proof statements when discussing LLMs. I think that, specifically---i.e., using LOC generated as the basis of any meaningful argument about effectiveness or productivity---is a silly thing to do. There are plenty of other things we should discuss besides LOC.
I guess I over-diagnosed your stance, apologies.
I wonder if you have a take on measuring productivity in light of the potential difficulty of achieving good outcomes across the general population?
You mention in the second appendix (which I skipped on my first read), that you are a rather experienced LLM user, with experiences in all the harnesses and context management which are touted as "best practice" nowadays. Given the effort this seems to take, do you think we're vulnerable to mis-measuring.
My mind is always thrown to arguments about Agile, or even Communism. "True Communism has never been tried" or "Agile works great when you do it right", which are still thrown about in the face of evidence that these things seem impossible, or at least very difficult, to actually implement successfully across the general population. How would we know if AI-driven-development had a theoretical higher maximum "productivity" (substitute with "value", "virtue", "the general good", whatever you want here) than non AI-driven-development, but still a lower actual productivity due to problems in adoption of the overall paradigm?