> The approval gates came out, replaced by a sequenced market-group rollout (test -> eu-0 -> eu-1 -> eu-2) that uses the smaller regions as an alarm buffer before the critical eu-2 region.
this is interesting. these names dont look like aws regions. are they geographic divisions or groups randomly assigned to users? or based on something like total spend over last N months to make sure high value customers dont quit?
if users are assigned to a stable group that would mean some of them experience way more issues than others. i would do it with a random subset of AZs or individual accounts thats different every time. not sure which one is the case here.
[dead]
Hi all. Blog author here, happy to take any questions you might have. It's a bit of a long one, but I put a lot of effort into the story arc and readability so it's (hopefully) an easy top to bottom read, that takes you on our journey. It covers some advanced distributed computing topics, such as replicating the exact same hash-ring as our existing load-balancer in the client application (JVM), what we implemented to fade-in new pods slowly so their caches get a chance to warm, and how we attempted to migrate to Availability Zone (AZ) Aware routing to save on AWS' inter-az transfer fees. Hope you enjoy it!