logoalt Hacker News

andixtoday at 7:24 PM0 repliesview on HN

Failure rates also go up. For AI inference it’s probably not too bad in most cases, just take the node offline and re-schedule the jobs to other nodes.