There's a million algorithms to make LLM inference more efficient as a tradeoff for performance...

make3 • yesterday at 10:17 PM • 0 replies • view on HN

There's a million algorithms to make LLM inference more efficient as a tradeoff for performance, like using a smaller model, using quantized models, using speculative decoding with a more permissive rejection threshold, etc etc

alt Hacker News