That's true, but I don't know if this one was ever a good measure in the first place.
People use AI differently and they can be equally productive with a variety of token usage quantities.
Also, different kinds of work are differently amenable to using AI.
I think we've found an extension of Goodhart's law- it makes bad measures even worse.
Measuring tokens used can absolutely be useful; tracking things like cost, compute-demand, usage to negotiate a better contract, and on and on.
Using it to grade people is, err, rather unwise.