This kind of research is underrated. I have a strong feeling that these kinds of harness improvements will lead to solving whole classes of problems reliably, and matter just as much as model training.