To make such testing strategy to work data structures need to be small.
Why would that be true?
It is better to have multiple small data structures rather than one big universal one where methods are defined at the ctx level making exhaustive tests difficult.
I don't think you are backing this up at all, you just keep saying it over and over. It's also not even about big data structures it's about having fewer data structures and using them over and over. You can see where this is effective even in javascript and lua with their tables that have hash maps and arrays.