How do you deal with drive failures? How often does a Railway team member need to visit a DC? What's it like inside?
Everything is dual redundancy. We run RAID so if a drive fails it's fine; alerting will page oncall which will trigger remote hands onsite, where we have spares for everything in each datacenter
Everything is dual redundancy. We run RAID so if a drive fails it's fine; alerting will page oncall which will trigger remote hands onsite, where we have spares for everything in each datacenter