really impressive for the size. Curious to see what happens when someone trains a 100B+ model natively at 1-bit.