That still sounds like a dumb strategy. Or, more likely, post hoc rationalization.
You reward me for wasting tokens and punish me for not wasting them, I will maximally waste them and wont "explore hownto make them useful". The latter wastes less tokens and that is punished.