Were they trying to measure other things? Definitely. The COO at Uber, one of the examples in the source article, has talked publicly about how they've searched for (and so far failed to find) a link between micro-level metrics driven by AI and concrete improvements in high level project velocity.
Do these measurements have sufficient information? As much as any, I'd guess. It sounds like you already know that it's pretty hard in general to measure the productive output of software development organizations.
I have no doubt a few companies, like Uber, were measuring other things and had applicable metrics in place before adopting Clod or CoPilot or whatever automation. I'm speaking in the general sense of companies adopting the latest hype without reflection.