We do have chatbot arena which to a degree already does this.
I like to use:
"Kim's mother is Linda. Linda's son is Rachel. John is Kim's daughter. Who is Kim's son?"
Interestingly I just got a model called "engine test" that nailed this one in a three sentence response, whereas o1-preview got it wrong (but has gotten it right in the past).