This is actually a very deep and interesting topic. Stripping information from an identifier disconn...

mkleczek • yesterday at 12:31 PM • 5 replies • view on HN

This is actually a very deep and interesting topic. Stripping information from an identifier disconnects a piece of data from the real world which means we no longer can match them. But such connection is the sole purpose of keeping the data in the first place. So, what happens next is that the real world tries to adjust and the "data-less" identifier becomes a real world artifact. The situation becomes the same but worse (eg. you don't exist if you don't remember your social security id). In extreme cases people are tattooed with their numbers.

The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

Replies

Ukv • yesterday at 3:30 PM

> Stripping information from an identifier disconnects a piece of data from the real world which means we no longer can match them. But such connection is the sole purpose of keeping the data in the first place.

The identifier is still connected to the user's data, just through the appropriate other fields in the table as opposed to embedded into the identifier itself.

> So, what happens next is that the real world tries to adjust and the "data-less" identifier becomes a real world artifact. The situation becomes the same but worse (eg. you don't exist if you don't remember your social security id). In extreme cases people are tattooed with their numbers.

Using a random UUID as primary key does not mean users have to memorize that UUID. In fact in most cases I don't think there's much reason for it to even be exposed to the user at all.

You can still look up their data from their current email or phone number, for instance. Indexes are not limited to the primary key.

> The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

A fully random primary key takes into account that things change - since it's not embedding any real-world information. That said I also don't think there's much issue with embedding creation time in the UUID for performance reasons, as the article is suggesting.

➕ show 2 replies

everforward • yesterday at 3:50 PM

> The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

I think artificial and data-less identifiers are the better means of identification that takes into account that things change. They don't have to be the identifier you present to the world, but having them is very useful.

E.g. phone numbers are semi-common identifiers now, but phone numbers change owners for reasons outside of your control. If you use them as an internal identifier, changing them between accounts gets very messy because now you don't have an identifier for the person who used to have that phone number.

It's much cleaner and easier to adapt if each person gets an internal context-less identifier and you use their phone number to convert from their external ID/phone number to an internal ID. The old account still has an identifier, there's just no external identifier that translates to it. Likewise if you have to change your identifier scheme, you can have multiple external IDs that translate to the same internal ID (i.e. you can resolve both their old ID and their new ID to the same internal ID without insanity in the schema).

➕ show 1 reply

brettgriffin • yesterday at 6:36 PM

The surrogate key's purpose isn't to directly store the natural key's information, rather, it's to provide an index to it.

> The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

There isn't 'another' - there's just one. The surrogate key. The other pieces of information you're describing are not the means of indexing the data. They are the pieces of data you wish to retrieve.

➕ show 1 reply

PunchyHamster • yesterday at 11:23 PM

Identifier is just "a piece of common token system can use to operate on same entity.

You need it. Because it's maybe one lone unchangeable thing. Taking person for example: * date of birth can be changed, if there was error and correction in documents * any and near all of existing physical characteristics can change over time, either due to brain things (deciding to change gender), aging, or accidents (fingerprints no longer apply if you burnt your skin enough) * DNA might be good enough, but that's one fucking long identifier to share and one hard to validate in field.

So an unique ID attached to few other parts to identify current iteration of individual is the best we have, and the best we will get.

vrighter • yesterday at 1:54 PM

You can't take into account the fact that things change when you don't know what those changes might be. You might end up needing to either rebuild a new database, have some painful migration, or support two codepaths to work with both types of keys.

➕ show 2 replies

alt Hacker News

Replies