logoalt Hacker News

ladbergtoday at 2:00 PM4 repliesview on HN

I'm curious - what were you doing that polars was leaving a 40-80x speedup on the table? I've been happy with it's speed when held correctly, but it's certainly easy to hold it incorrectly and kill your perf if you're not careful


Replies

__mharrison__today at 3:21 PM

20 year old BI app. Columnar DBs weren't really a thing. (MonetDB was brand new but not super stable. I committed the SQLAlchemy interface to it.)

devnotes77today at 2:08 PM

Polars is fastest when you avoid eager eval mid-pipeline. If you see a 40x gap it's often from calling .collect() inside a loop or applying Python UDFs row-wise.

show 1 reply
dartharvatoday at 3:20 PM

Might be tangential but in my recent experience polars kept crashing the python server with OOM errors whenever I tried to stream data from and into large parquet files with some basic grouping and aggregation.

Claude suggested to just use DuckDB instead and indeed, it made short work of it.