logoalt Hacker News

joefourierlast Tuesday at 11:45 AM0 repliesview on HN

Yeah most of the performance increases have mostly been from architectural improvements like reduced precision tensor cores. AFAIK FP4 is basically the limit for floating point matmuls, after which you need to switch to integer addition if you want to reduce bits, and I don’t think we’ve figured out 1-bit LLMs just yet.