> It does not matter if they are random or not.
Again, sometimes it does, the article lists a few of them. Making it harder to scrape, unifying across databases that share a keyspace, etc.
> And how does surrogate key help? I don't know the surrogate key that identifies my records in your database. Even if you use them internally it is an implementation detail.
That surrogate key is linked to literally every other record in the database I have for you. There are near infinite ways for me to convert something you know to that surrogate key. Give me a transaction ID, give me a phone number/email and the rough date you signed up, hell give me your IP address and I can probably work back to a user ID from auth logs.
The point isn't that you know the surrogate key, it's that _everything_ is linked to that surrogate key so if you can give me literally any info you know I can work back to the internal ID.
> This complexity is there whether you want it or not and you're not going to eliminate it with surrogate keys. It has to be explicitly taken care of.
Okay, then lets do an exercise here. A user gives you a transaction ID, and you have to tell them the date they signed up and the date you first billed them. I think yours is going to be way more complicated.
Mine is just something like:
SELECT user_id FROM transactions WHERE transaction_id=X; SELECT transaction_date FROM transactions WHERE user_id=Y ORDER BY transaction_date ASC LIMIT 1; SELECT signup_date FROM users WHERE user_id=Y;
Could be a single query, but you get the idea.
> DBMSes provide means to tackle this essential complexity: bi-temporal extensions, views, materialized views etc.
This kind of proves my point. If you need bi-temporal extensions and materialized views to tell a user what their email address is from a transaction ID, I cannot imagine the absolute mountain of SQL it takes to do something more complicated like calculating revenue per user.
I am not sure you are arguing against my claims or not :)
I am not arguing against surrogate keys in general. They are obviously very useful _internally_ to introduce a level of indirection. But if they are used _internally_ then it doesn't really matter if they are UUIDs or sequence numbers or whatever - it is just an implementation detail.
What I claim is that surrogate keys are problematic as _externally visible_ identifiers.
> Okay, then lets do an exercise here. A user gives you a transaction ID, and you have to tell them the date they signed up and the date you first billed them. I think yours is going to be way more complicated.
> Mine is just something like:
> SELECT user_id FROM transactions WHERE transaction_id=X; SELECT transaction_date FROM transactions WHERE user_id=Y ORDER BY transaction_date ASC LIMIT 1; SELECT signup_date FROM users WHERE user_id=Y;
I think you are missing the actual problem I am talking about: where does the user take the transaction ID from? Do you expect the users to remember all transaction IDs your system ever generated for them? How would they know which transaction ID to ask about? Are they expected to keep some metadata that would allow them to identify transaction IDs? But if there is metadata that enables identification of transaction IDs then why not use it instead of transaction ID in the first place?