What's the value running the smaller model too? Why not just the big model for everything? I no...

jauntywundrkind • today at 5:16 AM • 1 reply • view on HN

What's the value running the smaller model too? Why not just the big model for everything? I note both are dense, as well.

Replies

Ritewut • today at 5:34 AM

Tokens per second. The difference between 8B and something like 16B is not as big as you might think in practical usage and 8B is a lot faster and interactive than 16B but there are certain things where it is useful to farm it out to the large model.

➕ show 1 reply

alt Hacker News

Replies