How is the research on training these models directly in their quantized state going?
That'll be the real game changer.
The original BitNet was natively trained on 1.58 bits. PrismML has not released any actual info on how they trained, but since they are based on Qwen, there was certainly some downstream quantization involved.
The original BitNet was natively trained on 1.58 bits. PrismML has not released any actual info on how they trained, but since they are based on Qwen, there was certainly some downstream quantization involved.