10 years ago I used Jim Gray's piece about Tandem fault tolerance in a talk about Erlang at Midwest.io (RIP, was a great conference).
Because it's a small world, a former Tandem employee was attending the talk. Unfortunately it's been long enough that I don't remember much of our conversation, but it was impressive to hear how they moved a computer between data centers; IIRC, they simply turned it off, and when they powered it back on, the CPU resumed precisely where it had been executing before.
(I have no idea how they handled the system clock.)
Jim Gray's paper:
https://jimgray.azurewebsites.net/papers/TandemTR86.2_FaultT...
That is crazy! I assume that all the RAM was battery backed? What about the CPU cache, the OS state etc? I'm struggling to see how this was possible.
> I have no idea how they handled the system clock.)
It is or was on the Internet Archive and probably elsewhere -
Tandem Systems Review, Volume 2, Number 1 (February 1986) - "Managing System Time Under Guardian 90"