In my experience the tools feel like they make sense when I use them properly, or at least I have a hard time relating the failure modes to this walk/drive thing with bizarre adversarial input. It just feels a little bit like garbage in, garbage out.
Okay, but when you're asking a model to do things like summarizing documents, analyzing data, or reading docs and producing code, etc, you don't necessarily have a lot of control over the quality of the input.