Did I miss the part of the article where they break down how they determined race? Is the algorithm ...

wand3r • yesterday at 8:32 PM • 4 replies • view on HN

Did I miss the part of the article where they break down how they determined race? Is the algorithm blind to race? It looks like they specifically looked at 83k people applying to ~100 companies which notably were Fortune 500 companies. Could there simply be candidate discrepancies here? Hard for me to follow the full methodology but it doesn't necessarily seem either malicious or that well structured. Don't you need to have a control group of applicants who are similar on paper? To allege DISCRIMINATION is quite bold.

Definitely open to opposing or critical views

Replies

zerocrates • yesterday at 9:09 PM

The 83,000 applications to Fortune 500 companies, that was a different previous study they compared their results to. This paper's takeaway is that unlike that Fortune 500 data, the applications here that went through an ML vendor's screening process showed evidence of "systemic rejection," where some applicants got rejected across the board at higher rates than you'd expect if they were facing independent would-be employers.

rayiner • yesterday at 10:02 PM

That’s not the data set used for this paper: https://algorithmichiring.github.io/

If you click through, the paper says the race is self-reported.

“Our data tracks 4,197,168 applications. It includes applicant gameplay features and for each application, the application date, the position name and employer, metadata about the position and employer, and the numerical score and final recommendation each applicant received for each completed application. 40.2% of applicants self-report race with a breakdown of 16.8% Asian, 14.2% White, 3.6% Black, 3.0% Hispanic, and all other racial categories below 2% (i.e. fewer than 100,000 applicants).”

gacgacgac • yesterday at 8:47 PM

Yes. You missed it. They are using a test dataset of 83k resumes generated in 2022 for this paper and comparing it as a baseline against their observational data: https://www.nber.org/papers/w29053

The dataset is constructed, deliberately, to hold candidate performance constant and vary the names of candidates to appear to be associated with a specific race.

➕ show 4 replies

8note • yesterday at 10:13 PM

id expect any algorithm to learn race by other properties in the data?

its going to be in the rest of the data because race has a meaningful correlation, and pleanty of causation with being disadvantaged in real ways, that can also affect the ability to then do certain jobs.

like, the environmental pollution and building interstates and freeways through black communities, on purpose to do bad things to those communities, then results in a bunch of noise and particulate pollution, that is bad for developing brains.

you wont be able to do some meritocratic non-racist hiring without fixing the environmental racism. otherwise youre just mirroring racism other people built for you

alt Hacker News

Replies