The article starts well, on trying to condense pandas' gaziliion of inconsistent and continuous...

rich_sasha • today at 10:59 AM • 2 replies • view on HN

The article starts well, on trying to condense pandas' gaziliion of inconsistent and continuously-deprecated functions with tens of keyword arguments into a small, condensed set of composable operations - but it lost me then.

The more interesting nugget for me is about this project they mention: https://modin.readthedocs.io/en/latest/index.html called Modin, which apparently went to the effort of analysing common pandas uses and compressed the API into a mere handful of operations. Which sounds great!

Sadly for me the purpose seems to have been rather to then recreate the full pandas API, only running much faster, backed by things like Ray and Dask. So it's the same API, just much faster.

To me it's a shame. Pandas is clearly quite ergonomic for various exploratory interactive analyses, but the API is, imo, awful. The speed is usually not a concern for me - slow operations often seem to be avoidable, and my data tends to fit in (a lot of) RAM.

I can't see that their more condensed API is public facing and usable.

Replies

sweezyjeezy • today at 2:43 PM

The pandas API is awful, but it's kind of interesting why. It was started as a financial time series manipulation library ('panels') in a hedge fund and a lot of the quirks come from that. For example the unique obsession with the 'index' - functions seemingly randomly returning dataframes with column data as the index, or having to write index=False every single time you write to disk, or it appending the index to the Series numpy data leading to incredibly confusing bugs. That comes from the assumption that there is almost always a meaningful index (timestamps).

bbkane • today at 1:49 PM

Check out polars- I find it much more intuitive than pandas as it looks closer to SQL (and I learned SQL first). Maybe you'll feel the same way!

➕ show 2 replies

alt Hacker News

Replies