logoalt Hacker News

kingstnaptoday at 2:59 AM1 replyview on HN

Impressive performance work. It's interesting that you still see these 40+% perf gains like this.

Makes you think that you will continue to see the costs for a fixed level of "intelligence" dropping.


Replies

whoevercarestoday at 3:47 AM

Absolutely. LLM inference is still a greenfield — things like overlap scheduling and JIT CUDA kernels are very recent. We’re just getting started optimizing for modern LLM architectures, so cost/perf will keep improving fast.