> Duplicates in source data are almost always a sign of bad data modeling Nope. Duplicates in s...

mamcx • yesterday at 3:54 PM • 0 replies • view on HN

> Duplicates in source data are almost always a sign of bad data modeling

Nope. Duplicates in source data(INPUT) is natural, correct and MUST be supported or almost all data become impossible.

What is the actual problem is the OUTPUT. Duplicates on the OUTPUT need to be controlled and explicit. In general, we need in the OUTPUT a unique rowby a N-key, but probably not need it to be unique for the rest, so, in the relational model, you need unique for a combination of columns (rarely, by ALL of them).

alt Hacker News