logoalt Hacker News

entropicdriftertoday at 5:36 PM0 repliesview on HN

I'd rather see a distill on the 26B model that uses only 3.8B parameters at inference time. Seems like it will be wildly productive to use for locally-hosted stuff