There's a benchmark which works similarly but they ask harder questions, also based on books https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/o...
I guess they have to add more questions as these context windows get bigger.