logoalt Hacker News

KeplerBoy10/11/20240 repliesview on HN

I haven't looked at all implementations, but the hardware (tensor cores as well as cuda cores) allows you to accumulate at fp16 precision.