logoalt Hacker News

llm_nerdtoday at 4:02 PM1 replyview on HN

It's just more common as a legacy artifact from when nvidia was basically the only option available. Many shops are designing models and functions, and then training and iterating on nvidia hardware, but once you have a trained model it's largely fungible. See how Anthropic moved their models from nvidia hardware to Inferentia to XLA on Google TPUs.

Further it's worth noting that the Ironwood, Google's v7 TPU, supports only up to BF16 (a 16-bit floating point that has the range of FP32 minus the precision. Many training processes rely upon larger types, quantizing later, so this breaks a lot of assumptions. Yet Google surprised and actually training Gemini 3 with just that type, so I think a lot of people are reconsidering assumptions.


Replies

qeternitytoday at 8:12 PM

This is not the case for LLMs. FP16/BF16 training precision is standard, with FP8 inference very common. But labs are moving to FP8 training and even FP4.