You are effectively describing SimpleQA but with a single question instead of a comprehensive benchmark and you can note the dramatic increase in performance there.