I'd rather see a distill on the 26B model that uses only 3.8B parameters at inference time. See...

entropicdrifter • today at 5:36 PM • 0 replies • view on HN

I'd rather see a distill on the 26B model that uses only 3.8B parameters at inference time. Seems like it will be wildly productive to use for locally-hosted stuff

alt Hacker News