I wonder if semi-reliable RAM could be made to work for training. After all gradient descent already works in a stochastic environment, so maybe the noise from a few flipped bits doesn't matter too much.
Also, depends on the nature of the error. If only a small memory range is affected, you could patch the kernel to avoid it.
Also, depends on the nature of the error. If only a small memory range is affected, you could patch the kernel to avoid it.