Absolutely. Anyone working on inference token level knows how wasteful it all is especially in multimodal tokens.