logoalt Hacker News

Apache Arrow is 10 years old

167 pointsby toshtoday at 1:13 PM40 commentsview on HN

Comments

data_derstoday at 3:37 PM

if I could tell myself in 2015 who had just found the feather library and was using it to power my unhinged topic modeling for power point slides work, and explained what feather would become (arrow) and the impact it would have on the date ecosystem. I would have looked at 2026 me like he was a crazy person.

Yet today I feel it was 2016 dataders who is the crazy one lol

show 1 reply
aynyctoday at 6:23 PM

What's the difference between feather and parquet in terms of usage? I get the design philosophy, but how would you use them differently?

show 2 replies
HoldOnAMinutetoday at 9:37 PM

I read that entire page and I could not tell you what Apache Arrow is, or what it does.

show 1 reply
pm90today at 5:16 PM

Its nice to see useful, impactful interchange formats getting the attention and resources they need, and ecosystems converging around them. Optimizing serialization/deserialization might seem like a "trivial" task at first, but when moving petabytes of data they quickly become the bottlenecks. With common interchange formats, the benefits of these optimizations are shared across stacks. Love to see it.

show 1 reply
aerzentoday at 6:24 PM

I like arrow for its type system. It's efficient, complete and does not have "infinite precision decimals". Considering Postgres's decimal encoding, using i256 as the backing type is so much saner approach.

mempkotoday at 5:25 PM

We use Apache Arrow at my company and it's fantastic. The performance is so good. We have terabytes of time-series financial data and use arrow to store it and process it.

show 2 replies
actionfromafartoday at 3:26 PM

I had to look up what Arrow actually does, and I might have to run some performance comparisons vs sqlite.

It's very neat for some types of data to have columns contiguous in memory.

show 4 replies