I think this is valid though. Transformer models don't explicitly do logic but implicitly "vibe" out the answer from the input sequence (using the attention mechanism) and learnt knowledge - they're predicting text sequences after all. So adding more irrelevant context to the input would quite likely influence the the output.
I could see attention possibly being able to overcome this, but if not that would be a pretty big gotcha for real-world applications and reliability in real-world scenarios where, as others have said, it's not immediately clear what is relevant info. These models would be a lot less useful if a human had to decide which information to feed them and the output would be dependent on human judgement. I understand it's where we're at right now and that they are quite useful already but the valuations hint at investors expecting more imo.