Just guessing here, but these small models may well be essentially distillations of larger ones, wit...

HarHarVeryFunny • yesterday at 2:09 PM • 0 replies • view on HN

Just guessing here, but these small models may well be essentially distillations of larger ones, with this being where their power comes from. e.g. Use a large model to generate synthetic reasoning traces, then train a small model on those.

alt Hacker News