logoalt Hacker News

jedbergyesterday at 9:16 PM2 repliesview on HN

This is a good talk. Really gets into the details of how things differ from the classical SaaS or consumer product.

I've been doing reliability for most of my career, and have always been able to hide behind, "We're not a bank, if we lose a few requests it doesn't matter". They can't do that. :)

One advantage that they have is that the market closes, so they can do maintenance that takes the whole system down, but when you're running a global consumer product, it's a lot harder to do that without pushback.

So for most of us, our stress is around zero downtime maintenance, and theirs is around never dropping a request when the system is live.


Replies

cyberpunkyesterday at 9:59 PM

Yeah, I work on systems with reliability requirements like this at a large bank.

There are multiple layers of controls and manual interventions and things, which while absolutely painful, slow, expensive and shitstorm-conjuring -- are ultimately the final authority on some failures.

For e.g, in payments -- every single settlement or clearing anomaly is looked at by a real human, and rectified/rebooked manually.

So, yeah, the stakes can be really high when you have a couple billion in memory on your server, but -- it's just a system.

And it will fail, and we plan for it to do so.

gricardo99yesterday at 9:30 PM

there’s a move now towards 24/7 trading. I guess we’ll see how the rigors of the trading environment mesh with zero down time. I’m sure the rollout will be slow and steady.

show 3 replies