logoalt Hacker News

intothemildyesterday at 8:24 PM1 replyview on HN

That's already happening. Qwen3.6 and Gemma4.

Basically small and medium models that are crazy well trained for their sizes.

Then we have a lot of specular decoding stuff like MTP and others coming to speed up responses, and finally better quantisation to use less memory.

Local LLM is the future, and the larger labs know that the open models will eat their lunch once people realise that the gap is only a few months. If we were good with LLMs a couple months ago, we're good with the open models now.


Replies

krupanyesterday at 9:30 PM

And how were those models developed and trained?

show 1 reply