logoalt Hacker News

londons_exploretoday at 7:32 AM1 replyview on HN

How is the research on training these models directly in their quantized state going?

That'll be the real game changer.


Replies

sigmoid10today at 8:05 AM

The original BitNet was natively trained on 1.58 bits. PrismML has not released any actual info on how they trained, but since they are based on Qwen, there was certainly some downstream quantization involved.

show 1 reply