>The hard thing is to do the rigorous testing itself.
This. Rigorous testing is hard and it requires a high degree of intuition and intellectual humility. When I'm evaluating something as part of my resaerch, I'm constantly asking: "Am I asking the right questions?" "Am I looking at the right metrics?" "Are the results noisy, to what extent, and how much does it matter?" and "Am I introducing confounding effects?" It's really hard to do this at scale and quickly. It necessarily requires slow measured thought, which computers really can't help with.
>The hard thing is to do the rigorous testing itself.
This. Rigorous testing is hard and it requires a high degree of intuition and intellectual humility. When I'm evaluating something as part of my resaerch, I'm constantly asking: "Am I asking the right questions?" "Am I looking at the right metrics?" "Are the results noisy, to what extent, and how much does it matter?" and "Am I introducing confounding effects?" It's really hard to do this at scale and quickly. It necessarily requires slow measured thought, which computers really can't help with.