logoalt Hacker News

charcircuittoday at 6:39 PM0 repliesview on HN

That could leave idle time where GPUs are sitting unused. It would be better to have a shared cluster that many engineers all share. And to avoid a cluster not being saturated other companies queries could also be batched. And oh wait we are back to doing AI inference in the cloud as it is an efficient way to serve AI.