logoalt Hacker News

TheTaytaytoday at 3:51 PM1 replyview on HN

I think you have a point, and SCD type 2 feels like a workaround, but there is also something to be said for the ability to query every row as it was at any given version. I’m not saying that SCD type 2 is the best solution given there might be a more domain-specific way to do it, but I see it a lot like file-based version control. It’s convenient to be able to examine all files as they existed at any point in time, without having to “model” the ways in which those files might change directly into the domain of the individual files.

If you have something like dolt (not affiliated), a version controlled database, you wouldn’t have to slap change dates on anything OR create your historical table. The changes would be implicit in the version history.


Replies

hasyimibhartoday at 4:47 PM

> If you have something like dolt (not affiliated), a version controlled database, you wouldn’t have to slap change dates on anything OR create your historical table. The changes would be implicit in the version history.

Nope, even if I have the ability to see the exact changes of each row, I would still add timestamps everywhere, because timestamp of row change does not equal event timestamp. For example, if I have an order table with status column, and I see a CDC event where status changed from in_progress to completed, I cannot simply assume that the CDC timestamp is the timestamp when order was completed. It's possible that the source database received the event late a few minutes late due to delay upstream, or it's backfilling some missed orders a few days ago. Having a completed_at timestamp (and a bunch of other timestamps for each order lifecycle) would eliminate any ambiguities, and your data analyst will thank you for it.

It's the same thing with row history. You cannot simply assume that your row changes are aligned with the logical history of your entity.