logoalt Hacker News

verdvermyesterday at 7:05 PM0 repliesview on HN

You can use a lot more tokens on hardware than you can spend on a $200/m plan.

Inwrnt through 1B tokens my first month with an OEM spark. That's more than $1k of opus. Not a fair comparison, because token patterns are different, but since that time I have also seen a 2-3x improvement in then speeds.from improvements in vllm (mainly MTP). DiffusionGemma is around 4x regular gemma.