Yeah, I don't think that "I'd be happy to help you with that" or "Sure, let me take a look at that for you" carries much useful signal that can be used for the next tokens.
There is a study that shows that what the model is doing behind the scenes in those cases is a lot more than just outputting those tokens.
For an LLM, tokens are thought. They have no ability to think, by whatever definition of that word you like, without outputting something. The token only represents a tiny fraction of the internal state changes made when a token is output.
Clearly there is an optimal for each task (not necessarily a global one) and a concrete model for a given task can be arbitrarily far from it. But you'd need to test it out for each case, not just assume that "less tokens = more better". You can be forcing your model to be dumber without realizing it if you're not testing.
They carry information in regular human communication, so I'm genuinely curious why you'd think they would not when an LLM outputs them as part of the process of responding to a message.
You'd be surprised -- This could match on the model's training to proceed using a tool, for example.