"We can run your dumbed down models faster": #The use of NVFP4 results in a 3.5x reducti...

AugSun • today at 5:03 AM • 0 replies • view on HN

"We can run your dumbed down models faster":

#The use of NVFP4 results in a 3.5x reduction in model memory footprint relative to FP16 and a 1.8x reduction compared to FP8, while maintaining model accuracy with less than 1% degradation on key language modeling tasks for some models.

alt Hacker News