Stopped reading at "Our main storage arrays have no redundancy". This isn't a data center, it's a volatile AI memory bank.
You should have kept reading:
> Redundancy is not needed since no specific data is critical.
> we have a redundant mkv storage array to store all of our trained models and training metrics.
That's just called understanding your failure domains, and RTO/RPO needs.
This turns out to be a more and more important primitive for companies who are building their own models [1].
[1] https://si.inc/posts/the-heap/
You should have kept reading:
> Redundancy is not needed since no specific data is critical.
> we have a redundant mkv storage array to store all of our trained models and training metrics.
That's just called understanding your failure domains, and RTO/RPO needs.