Lots of Pandas hate in this thread. However, for folks with lots of lines of Pandas in production, F...

__mharrison__ • 11/20/2024 • 1 reply • view on HN

Lots of Pandas hate in this thread. However, for folks with lots of lines of Pandas in production, Fireducks can be a lifesaver.

I've had the chance to play with it on some of my code it queries than ran in 8+ minutes come down to 20 seconds.

Re-writing in Polars involves more code changes.

However, with Pandas 2.2+ and arrow, you can use .pipe to move data to Polars, run the slow computation there, and then zero copy back to Pandas. Like so...

    (df
     # slow part
     .groupby(...)
     .agg(...)
    )

to:

    def polars_agg(df):
      return (pl.from_pandas(df)
        .group_by(...)
        .agg(...)
        .to_pandas()
      )

    (df
      .pipe(polars_agg)
    )

Replies

dr_kiszonka • 11/21/2024

Very effective!

➕ show 1 reply

alt Hacker News

Replies