logoalt Hacker News

ttyprintkyesterday at 3:36 AM1 replyview on HN

Since DuckDB can read and write Pandas from memory, a team with varying Pandas fluency can benefit from learning DuckDB.


Replies

adolphyesterday at 6:14 PM

Since Pandas 2, Apache Arrow replaced NumPy as the backend for Pandas. Arrow is also used by Polars, DuckDB, Ibis, the list goes on.

https://arrow.apache.org/overview/

Apache Arrow solves most discussed problems, such as improving speed, interoperability, and data types, especially for strings. For example, the new string[pyarrow] column type is around 3.5 times more efficient. [...] The significant achievement here is zero-copy data access, mapping complex tables to memory to make accessing one terabyte of data on disk as fast and easy as one megabyte.

https://airbyte.com/blog/pandas-2-0-ecosystem-arrow-polars-d...