logoalt Hacker News

tptacektoday at 5:12 PM3 repliesview on HN

Have you read through the sources on that Github link? It's a set of sociology cites establishing that bias exists (something no serious person ever disputed), followed by a couple papers showing mechanistic descriptions of how bias could propagate through an LLM. The paper you call out specifically takes last-generation open-weights models and attempts to trick them into revealing biases through their level of confidence in statements (like, "the antecedent of the feminine pronoun in this sentence, is it the 'nurse' or the 'doctor'").

There's plenty of research into biases in LLMs, and there should be; it's a fundamentally new branch of computer science that could have profound impacts on how we automate and regiment social decisions in the future (like extending credit). The bias concern is well taken in those settings. But it has very little to do with the overwhelming majority of day-to-day LLM use; Claude and ChatGPT are not indoctrinating into the manosphere users asking about discounted cash flow formulae.

(Maybe Grok is though.)


Replies

dlcarriertoday at 6:43 PM

By design, LLMs follow the heuristic mean. Doing so is, by definition, the opposite of bias, although the meaning of the word has changed to include not following trends, which it doesn't do. Compared to periodicals, an LLM will be slow to change, although pretty much every other form of printed word is even slower to change, with editions of books usually having a cadence of a decade or more.

taerictoday at 5:14 PM

I confess I laughed harder at the Grok comment than I wish I had. Sad to remember that some strawmen are given life and promoted by people. Actively.

show 1 reply
skupigtoday at 5:34 PM

I'm not really sure what your point is. That was just the most recent paper linked on that repo, which is a convenient list of some relevant papers. There are probably a lot more recent studies, but it does convincingly show that models are still absorbing bias in a way that can affect prediction.

show 2 replies