Yes. You missed it. They are using a test dataset of 83k resumes generated in 2022 for this paper an...

gacgacgac • yesterday at 8:47 PM • 4 replies • view on HN

Yes. You missed it. They are using a test dataset of 83k resumes generated in 2022 for this paper and comparing it as a baseline against their observational data: https://www.nber.org/papers/w29053

The dataset is constructed, deliberately, to hold candidate performance constant and vary the names of candidates to appear to be associated with a specific race.

Replies

AStrangeMorrow • yesterday at 9:31 PM

From looking at how that was done, it seems they (the paper you linked) used an older paper which looked at which names are frequent enough and more biased toward a certain demographic (90% of that name occurrence falls within that demographic).

But they picked 9 family names per group. Which sounds quite low. And combined that with first names to reach 500 first+last names per group.

I wonder how much of the bias we see has to do with the names actually picked versus it being racially motivated (absolutely not denying that this probably is a factor, but might not be the only one).

For example, in France there is the national BAC end of high school exam. If you you at the names X grade distribution, and look at the higher “very good” bracket: some names are heavily under-represented (less than 5% of say “Jordan” get that grade) while some are over-represented (35% of “Josephine” get such a grade). The exam is for the most part anonymous, but some names are definitely heavily correlated with lower/higher income groups. So nothing surprising: Josephines tend to come from richer families, thus in average get better education/support, thus better grades. Same thing is true with family names to a smaller extent.

So I wonder how much of the bias we see, be it from real persons or the AI has more to do with a class thing than a racial thing. Again those are not neatly separate things, but still

➕ show 1 reply

rayiner • yesterday at 10:18 PM

That’s an earlier paper. This one involves 3 million real applicants, with no control for applicant quality.

xp84 • yesterday at 10:08 PM

Wow. So, all the 'people' and 'resumes' involved are fake, but they submitted them to real jobs?

Cool.

In any event, I'd happily support a ban on all parts of the ATS that could be involved in automated approval, rejection, or scoring being able to see candidate names. But I sense the author of this has a bigger agenda.

alt Hacker News

Replies