> I do think that SOTA LLMs like GPT-4o perform about as good as high school graduates in America with average intelligence.
This might be true in a strict sense, but I think it's really, really important to consider the uses of LLMs vs a high-school graduate. LLMs are confidently wrong (and confidently correct) with the exact same measure, and in many ways they are presented to users as unimpeachable.
If I ask an average person to do a medium-complex logic problem, my human brain discounts their answer because I've been socialized to believe that humans are bad at logic. I will take any answer I'm given with usually appropriate skepticism.
LLMs, on the other hand, are on the computer: an interface I've been socialized to believe is always correct on matters of math and logic. That's what it is, a logic machine. Second guessing the computer on matters of logic and arithmetic almost always result in me realizing my puny human mind has done something wrong.
To me, this directly contradicts your conclusion: LLMs are mostly only capable of misleading large portions of the population.
This is not inherent in the LLM though. Society will adjust to it after learning some very predictable (and predicted) lessons, just like it always does.
Would be good to put equivalent grades on LLM's then. Instead of GPT-4o, it's GPT-11th grade.