Trained human can tell if distracted: "I am distracted and can't figure out answer", while LLM will confidently gives you wrong answer, which makes whole results not reliable.