Why is DuckDB so popular when one can use Python + Pandas?
Better perf + SQL is that mostly it?
Pandas has lots and lots of problems.
Performance is definitely one of them, but it also has inconsistent and duplicated methods, inconsistent defaults (e.g. some methods are inplace by default), copy by reference issues, I could go on.
It was an early winner in an extremely popular language. That's really the main thing going for it, but alternatives have been a long time coming.
Why would you prefer Python and Pandas over good old SQL? Pandas is so verbose and hard to debug, most of the times struggle to be performant on small datasets.
SQL has been around since the dawn of databases. I am happy to see a trend away from pandas.
I wrote a blog post a while back to address this question here: https://www.robinlinacre.com/recommend_duckdb/
The better question is, why is DuckDB so popular when one can use Polars which has a sane, lintable, typesafe API compared to the mess that is SQL:
vs