I’m not really sure why you think .loc[lambda d: d["y"] > 0.5] ...

wodenokoto • 11/20/2024 • 3 replies • view on HN

I’m not really sure why you think

    .loc[lambda d: d["y"] > 0.5]

Is stylistically superior to

    [df.y > 0.5]

I agree it comes in handy quite often, but that still doesn’t make it great to write compared to what sql or dplyr offers in terms of choosing columns to filter on (`where y > 0.5`, for sql and `filter(y > 0.5)`, for dplyr)

Replies

oreilles • 11/20/2024

It is superior because you don't need to assign your dataframe to a variable ('df'), then update that variable or create a new one everytime you need to do that operation. Which means it is both safer (you're guaranteed to filter on the current version of the dataframe) and more concise.

For the rest of your comment: it's the best you can do in python. Sure you could write SQL, but then you're mixing text queries with python data manipulation and I would dread that. And SQL-only scripting is really out of question.

➕ show 1 reply

__mharrison__ • 11/20/2024

It's superior because it is safer. Not because the API (or requirement for using Lambda) looks better. The lambda allows the operation to work on the current state of the dataframe in the chained operation rather than the original dataframe. Alternatively, you could use .query("y > 0.5"). This also works on the current state of the dataframe.

(I'm the first to complain about the many warts in Pandas. Have written multiple books about it. This is annoying, but it is much better than [df.y > 0.5].)

alt Hacker News

Replies