logoalt Hacker News

llm_nerdtoday at 5:39 PM1 replyview on HN

>but columns aren't the end-all-be-all normalization format. I think pandas uses "frames".

Pandas is column oriented, as are basically all high performance data libraries. Each column is a separate array of data. To get a "row" you take the n item from each of the arrays.

And FWIW, column-oriented isn't considered normalization. It's a physical optimization that can yield enormous performance advantages for some classes of problems, but can cause a performance nightmare for other problems.

Data analytics loves column-oriented. CRUD type stuff does not. And in the programming realm there are several options to have Structures of Arrays (SoA) instead of the classic Arrays of Structures (AoS).


Replies

notepad0x90today at 9:59 PM

makes sense, I guess I just meant that it isn't proper normalization without typing. you can have types in something like a sql db (or frames as you pointed out). But a simple CSV, not so much, you'll have to come up with a custom type scheme using headers or something. So long as arrays are strongly typed, I suppose a simple cell in a column is enough.