LLMs sure do love to burn tokens. It’s like a high schooler trying to meet the minimum word length on a take home essay.
I feel like this has gotten much worse since they were introduced. I guess they're optimizing for verbosity in training so they can charge for more tokens. It makes chat interfaces much harder to use IMO.
I tried using a custom instruction in chatGPT to make responses shorter but I found the output was often nonsensical when I did this
well, they probably have quite a lot of text from high schoolers trying to meet the minimum word length on a take home essay in the training data
I've always wondered about that. LLM providers could easily decimate the cost of inference if they got the models to just stop emitting so much hot air. I don't understand why OpenAI wants to pay 3x the cost to generate a response when two thirds of those tokens are meaningless noise.