logoalt Hacker News

jryiotoday at 3:41 AM3 repliesview on HN

If anyone was wondering ... it's racist

Unsurprisingly the texts written up until that time were dominated by such individuals which is tragic for LLM training if you think about it.

The voiceless groups or fringe opinions which we take as normative today do not appear.

Does this encourage us to write in the present such that we influence the models in perpetuity?


Replies

ipaddrtoday at 4:52 AM

Voiceless groups do not appear in the training data? How could they, they are voiceless. You think the voiceless people are represented in todays training data? They cannot they are voiceless.

Nothing tragic about using data from a time period.

Common words used in 1900s are labeled racist now. I doubt anyone was wondering if they filtered those words for modern safe wordx.

pcftoday at 4:09 AM

[flagged]