I'd rather see a distill on the 26B model that uses only 3.8B parameters at inference time. Seems like it will be wildly productive to use for locally-hosted stuff