logoalt Hacker News

pbrummlast Saturday at 3:28 PM0 repliesview on HN

Have you tried switching it to a job queue where the GPU instances try to keep themselves busy. That way you can auto scale the gpus based on utilization. I find it easier to tune and you can monitor latency and backlogs easier. It does require some async mechanisms to the client but I have found it easier to maintain