> At scale, it's a lever: a distribution that reliably deflates some words and inflates othe...

like_any_other • today at 12:08 AM • 0 replies • view on HN

> At scale, it's a lever: a distribution that reliably deflates some words and inflates others is the mechanism you'd build if you wanted to shape what a billion users read without them noticing.

And this is how they're using that lever: Microsoft made an AI safety evaluation tool that classifies "stop hurting white people" (and no other group), "white lives are important", and "white identity will not be deconstructed" as hate speech:

https://github.com/microsoft/SafeNLP (in data/implicitHate.json)

https://x.com/fentasyl/status/1735410872162377937

alt Hacker News