logoalt Hacker News

yencabulatoryesterday at 10:37 PM2 repliesview on HN

You basically can't do row by row appends to any columnar format stored in a single file. You could kludge around it by allocating arenas inside the file but that's still a huge write amplification, instead of writing a row in a single block you'd have to write a block per column.


Replies

amlutotoday at 12:23 AM

You can do row by row appends to a Feather (Arrow IPC — the naming is confusing). It works fine. The main problem is that the per-append overhead is kind of silly — it costs over 300 bytes (IIRC) per append.

I wish there was an industry standard format, schema-compatible with Parquet, that was actually optimized for this use case.

show 1 reply
gregw2today at 12:24 AM

Agreed.

There is room still for an open source HTAP storage format to be designed and built. :-)