The problem here is that throwing in little gotchas like that is a tactic used by math and physics educators to ensure that students actually understand the topic by reasoning through new problems, rather than mindlessly turning the crank from learning the "surface structure" of earlier problem sets. The argument here is that the LLM is not reasoning, it's mindlessly turning a crank.
I don't think this exact question would be out of place on a 6th grade math test. I distinctly remember being taught this skill in "word problems," learning to identify information that actually pertains to the question rather than being distracted by red herrings the teacher threw in.
Indeed, and the ability to make heads or tails of slightly-slippery problems of this sort is an extremely important real-world math skill. It's not extraneous at all.
And their poor performance on these tasks highlights deficits in exactly the kind of higher-order, off-the-page reasoning skills -- i.e. to not just reason based on the apparent objects in the stream (the kiwis and the numbers in this case), but to reason about the token stream itself: "okay, these tokens are important, but these others I can leave out", efficiently and seamlessly (like humans do) -- that the models are supposed to develop.
This whole attention business, they're calling it.