I would guess they are trying to maximize training data

cousinbryce • yesterday at 5:31 PM • 1 reply • view on HN

If I was being rewarded for using more tokens, I would feed LLM output back into the model. That's probably not very useful training data.

alt Hacker News