LOC is useful here not because it's a metric for output but because it's a metric for _understandability_. Reviewing 200 lines is a very different workload than reviewing 2000.
It’s still a bad metric.
I have worked with code where 1000s of lines are very straightforward and linear.
I’ve worked on code where 100 lines is crucial and very domain specific. It can be exceptionally clean and well-commented and it still takes days to unpack.
The skills and effort required to review and understand those situations are quite different.
One is like distance driving a boring highway in the Midwest: don’t get drowsy, avoid veering into the indistinguishable corn fields, and you’ll get there. The other is like navigating a narrow mountain road in a thunderstorm: you’re 100% engaged and you might still tumble or get hit by lightning.
Its still posssible to run any LLM in a loop and optimize for LoC while preserving the wanted outcome.
That's assuming the 200 lines are logical and consistent. Many of my most frustrating LLM bugs are caused by things that look right and are even supported by lengthy comments explaining their (incorrect) reasoning.