logoalt Hacker News

juancntoday at 2:06 PM1 replyview on HN

It is possible to treat as purely relational but it can be suboptimal on data access if you follow through with it.

The main cost is on the join when you need to access several columns, it's flexible but expensive.

To take full advantage of columnar, you have to have that join usually implicitly made through data alignment to avoid joining.

For example, segment the tables in chunks of up to N records, and keep all related contiguous columns of that chunk so they can be independently accessed:

    r0, r1 ... rm; f0, f0 ... f0; f1, f1 ... f1; fn, fn ... fn
That balances pointer chasing and joining, you can avoid the IO by only loading needed columns from the segment, and skip the join because the data is trivially aligned.

Replies

brightballtoday at 4:27 PM

UPDATE's are also a challenge. It's very efficient for mass inserts/append workloads but updating columnar data can be an efficiency challenge.