logoalt Hacker News

Flere-Imsahoyesterday at 7:37 PM3 repliesview on HN

Yeah the future is probably a number of highly specialised small models you can run on your own hardware rather than massive frontier models in the cloud.

That's what I'm betting on anyway.


Replies

girvoyesterday at 10:11 PM

Step 3.7 Flash on my Asus GB10 based mini pc is incredibly close to that today. I’m very impressed, and that’s without MTP to boost performance

thewebguydyesterday at 7:43 PM

That seems to be what Microsoft is betting on also based on what was shown at the BUILD keynote today + that new surface ultra and the surface mini PC with the new Nvidia chip. Nadella really played up local AI as the main use case they have in mind.

search_facilityyesterday at 7:49 PM

MOE basically work that way already, QWEN/etc with low active params (A-number in name) allows to inference big models locally (only active params have to fit into memory)