If the argument is that LLMs are bad at reasoning because they are easily distractible and the results vary with modifications in the question, one should be reminded of the consistency and distractability of humans.
Why? LLMs are supposedly better than humans (as many comments claim in this thread).
Trained human can tell if distracted: "I am distracted and can't figure out answer", while LLM will confidently gives you wrong answer, which makes whole results not reliable.