I'm building a benchmark for coding agent memory following your philosophy. There are so many memory tools out there but I have not been able to find a reliable benchmark for coding agent memory. So I'm just building it myself.
A lot of this stuff is really new, and we will need to find ways to standardize, but it will take time and consensus.
It took 4 years after the release of the automobile to coin the term milage to refer to miles driven per unit of gasoline. We will in due time create the same metrics for AI.
Curious what papers you are reading on this. Benchmarks are way more important than people realize, on every level.