logoalt Hacker News

parthdesaitoday at 12:58 PM3 repliesview on HN

Serious question, have you been part of an org that had to scale orders of magnitude very quickly?

Anyone who has been part of that journey knows how painful it really is. A lot of times the systems to fail at all levels, and you have to redesign it from the first principles.


Replies

dijittoday at 1:26 PM

> Serious question, have you been part of an org that had to scale orders of magnitude very quickly?

I have, but it depends what you mean.

Scenario 1: e-commerce SaaS (think: Amazon but whitelabel, and before CPUs even had AES instructions); Christmas was "fun".

Scenario 2: Video Games. The first day is the worst day when it comes to scale. Everything has to be flawless from day 0 and you get no warning as to what can go wrong.

Yet, somehow, I managed to make highly reliable systems.

In scenario 1; I had an existing system that had to scale up and down with load, this was before there was cloud and hardware had a 3-4 month lead time, so most of the effort was around optimising existing code, increasing job timeouts and "quenching" sources that were expensive. We used to also do so 'magic' when it came to serving requests that had session token or shopping cart cookie.

In scenario 2; we have a clean-room implementation and no legacy, which is a blessing but also a curse, there's no possibility to sample real usage: but you also don't need to worry about making breaking changes that are for the better. With legacy you have to figure out how to migrate to the new behaviour gradually.

So, pro's and con's... but it's not like handling huge load hasn't been done before, computers are faster than they ever have been and while my personal opinion is that operational knowledge is dying (due to general distain for people who actually used to run systems that scale: not just write hopeful "eventually consistent" yaml that they call deterministic) - the systems that do exist today hold your hand much better than they did for me 20 years ago.

And I ran 1% of web traffic with an ops team of 5 back then. So, idk what's going on here.

EDIT: Likely people are flagging me because I sound arrogant (or I hurt their feelings by talking bad about YAML-ops), but all I am doing is answering the question presented based on my experience.

show 2 replies
HWR_14today at 1:17 PM

Is GitHub scaling by orders of magnitude though? That would be an insane increase at this stage of their lifecycle.

show 3 replies
owebmastertoday at 1:05 PM

> you have to redesign it from the first principles

And that start by layoffing your best engineers, I guess