This is super important - even if it's not currently the best measure of degradation yet. Anecdotally, Opus 4.5 has gotten so bad for me it's almost adding time to my workflow instead saving it. It'd be nice to have more 3rd party measurements like this to hold Anthropic accountable.