Impressive performance work. It's interesting that you still see these 40+% perf gains like thi...

kingstnap • today at 2:59 AM • 1 reply • view on HN

Impressive performance work. It's interesting that you still see these 40+% perf gains like this.

Makes you think that you will continue to see the costs for a fixed level of "intelligence" dropping.

Replies

Absolutely. LLM inference is still a greenfield — things like overlap scheduling and JIT CUDA kernels are very recent. We’re just getting started optimizing for modern LLM architectures, so cost/perf will keep improving fast.

alt Hacker News

Replies