logoalt Hacker News

cjboomsyesterday at 4:56 AM0 repliesview on HN

Oh, great questions. Thanks for reading!

1. I was not aware of rendezvous hashing, very interesting. Yes, we had an implementation to reverse engineer and cache parity was a priority. RH would be an elegant approach to fade-in alright. I wonder if it would work to provide consistent spill-over also, so cache affinity is preserved when spilling over to N pods, or would that break if we fed occupancy metrics into the RH algorithm.

2. Yes, this was primarily temporary during rollout. And also as a bit of a sales-pitch to the owning team that this was a two way door. Totally agree, and we will likely take it out now that everything has been running for a couple of months with complete stability. Right now, we are protected by our in-house Safe Deployments setup. Where our CI/CD system versions all ConfigMaps when a new version is deployed. A v1 deployment gets my-config-map-v1, and a v2 gets my-config-map-v2. So re-enabling Skipper would require a blue green deployment, where traffic is gradually switched back onto Skipper over a 30 minute window for each stack. No big-bang fallback to trigger a cascading failure.

Thanks for all the links, added to reading list.