logoalt Hacker News

cat_plus_plusyesterday at 11:32 PM0 repliesview on HN

At least for transformers, it can be kind of fixed with MOE + NVFP4 for small working set despite large resident size.