Because when you pay for a subscription they don't silently quantize the model a few week after...

alex7o • yesterday at 5:55 PM • 0 replies • view on HN

Because when you pay for a subscription they don't silently quantize the model a few week after release, and you can no longer get the full model running.

Otherwise no need for full fp16, int8 works 99% as well for half the mem, and the lower you go the more you start to pay for the quants. But int8 is super safe imo.

alt Hacker News