logoalt Hacker News

immanuwelltoday at 1:11 PM5 repliesview on HN

The normalization analogy is genuinely clever as a teaching tool, but it quietly papers over the fact that normalization is a logical design concept while columnar storage is a physical one - treating them as the same thing can mislead more than it clarifies, I think


Replies

zaptheimpalertoday at 8:03 PM

Yeah I feel like papering over the physical aspect actually misses the main motivation for columnar storage in the first place, which is to more efficiently store some types of data and perform OLAP queries on it.

_doctor_lovetoday at 5:09 PM

Oh man, thank you for saying this. The difference between logical and physical goes over so many people's heads. It's a little unnerving at times how much people resist it.

Definitely agree with what you said - if we treat them as the same thing that's going to mislead some folks.

jerftoday at 1:40 PM

I've always preferred to think of normalization as more about "removing redundancy" than in the frame it is normally presented. Or, to put it another way, rather than "normalizing" which has as a benefit "removing redundancy", raise the removing of redundancy up to the primary goal which has as a side benefit "normalization".

A nice thing about that point of view is that it fits with your point; redundancy is redundancy whether you look at it with a column-based view or a row-based view.

goerchtoday at 5:08 PM

Theoretically I would agree, but practically I still wonder why we need different database engines for row and columnar storage if supporting different types of indices is trivial(TM) for Postgres?

show 2 replies
hilariouslytoday at 1:25 PM

Fair, but one of the big benefits of normalization was the benefit on storage and memory back in the day which was tiny comparatively.

There's always a reason for a dev to ship something shitty but when you show you can use 80% less storage for the same operation you can make the accountants your lever.

show 2 replies