You are misreading this sentence. This sentence is saying: "Using a constructed dataset of resumes, whose only difference was a name change, we would anticipate a system evaluating on qualifications to produce an equal distribution of candidates across names. Our observed result was highly unequal, and that warrants further investigation."
To me it appears as if the study using the constructed dataset was a completely different one than the one that was concerned with AI.
For the AI study real data from "3.4 million people who submit 4 million job applications to 1,700 job postings across 150 employers and 11 industry sectors" was used.