logoalt Hacker News

hnfongyesterday at 12:56 PM1 replyview on HN

The curious thing about the article is that, it's definitely premature optimization for smaller databases, but when the database gets to the scale where these optimizations start to matter, you actually don't want to do what they suggest.

Specifically, if your database is small, the performance impact is probably not very noticeable. And if your database is large (eg. to the extent primary keys can't fit within 32-bit int), then you're actually going to have to think about sharding and making the system more distributed... and that's where UUID works better than auto-incrementing ints.


Replies

scottlambyesterday at 8:56 PM

I agree there's a scale below which this (or any) optimization matters and a scale above which you want your primary key to have locality (in terms of which shard/tablet/... is responsible for the record). But...

* I think there is a wide range in the middle where your database can fit on one machine if you do it well, but it's worth optimizing to use a cheaper machine and/or extend the time until you need to switch to a distributed db. You might hit this middle range soon enough (and/or it might be a painful enough transition) that it's worth thinking about it ahead of time.

* If/when you do switch to a distributed database, you don't always need to rekey everything:

** You can spread existing keys across shards via hashing on lookup or reversing bits. Some databases (e.g. DynamoDB) actually force this.

** Allocating new ids in the old way could be a big problem, but there are ways out. You might be able to switch allocation schemes entirely without clients noticing if your external keys are sufficiently opaque. If you went with UUIDv7 (which addresses some but not all of the article's points), you can just keep using it. If you want to keep using dense(-ish), (mostly-)sequential bigints, you can amortize the latency by reserving blocks at a time.