depends, a super small one finetuned to do function calling instead sending it to big model and wait...

disiplus • yesterday at 5:31 PM • 0 replies • view on HN

depends, a super small one finetuned to do function calling instead sending it to big model and waiting, instead, you ask for a revenue in last month, i do a small llm function call -> show results. some bigger ones, analysis, summary, classification. what is great with smaller ones, and im looking at 2b, 4b is you can get a huge throughput with just vllm and a couple of consumer gpus. what i usually do is basically distillation of a big one onto smaller one.

alt Hacker News