Failure rates also go up. For AI inference it’s probably not too bad in most cases, just take the node offline and re-schedule the jobs to other nodes.