logoalt Hacker News

iwontberudelast Thursday at 10:24 PM2 repliesview on HN

Your point could have made sense but the amount of inference per request is also going up faster than the costs are going down.


Replies

supern0valast Thursday at 10:56 PM

The parent said: "Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing."

SOTA improvements have been coming from additional inference due to reasoning tokens and not just increasing model size. Their comment makes plenty of sense.

manmallast Friday at 12:40 AM

Is it? Recent new models tend to need fewer tokens to achieve the same outcome. The days of ultrathink are coming to an end, Opus is well usable without it.