logoalt Hacker News

recursivegirthyesterday at 9:51 PM1 replyview on HN

Fundamental flaw with LLMs. It's not that they aren't trained on the concept, it's just that in any given situation they can apply a greater bias to the antithesis of any subject. Of course, that's assuming the counter argument also exists in the training corpus.

I've always wondered what these flagship AI companies are doing behind the scenes to setup guardrails. Golden Gate Claude[1] was a really interesting... I haven't seen much additional research on the subject, at the least open-facing.

[1]: https://www.anthropic.com/news/golden-gate-claude


Replies

yesitcantoday at 1:23 AM

This is the most Hacker News reply to a humorous comment.