logoalt Hacker News

JoshTripletttoday at 3:43 AM4 repliesview on HN

A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460


Replies

danpalmertoday at 4:18 AM

If you tell an LLM it's a mushroom you'll get thoughts considering how its mycelium could be causing the goblins.

This "theory" is simply role playing and has no grounding in reality.

krackerstoday at 4:16 AM

I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?

show 3 replies
yard2010today at 6:45 AM

I love the people thinking "I should ask ChatGPT and copy pasta the response to the (tweet|gh comment)"

dakollitoday at 3:46 AM

It is a stateless text / pixel auto-complete it has no references of self, stop spreading this bs.

show 4 replies