The easiest way would be to quantize the model, and serve different quants based on the current dema...

arcanemachiner • yesterday at 4:19 PM • 0 replies • view on HN

The easiest way would be to quantize the model, and serve different quants based on the current demand. Higher volumes == worse quant == more customers served per GPU

alt Hacker News