logoalt Hacker News

AugSuntoday at 5:03 AM0 repliesview on HN

"We can run your dumbed down models faster":

#The use of NVFP4 results in a 3.5x reduction in model memory footprint relative to FP16 and a 1.8x reduction compared to FP8, while maintaining model accuracy with less than 1% degradation on key language modeling tasks for some models.