A plausible theory I've seen going around:

JoshTriplett • today at 3:43 AM • 4 replies • view on HN

A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460

Replies

If you tell an LLM it's a mushroom you'll get thoughts considering how its mycelium could be causing the goblins.

This "theory" is simply role playing and has no grounding in reality.

I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?

➕ show 3 replies

yard2010 • today at 6:45 AM

I love the people thinking "I should ask ChatGPT and copy pasta the response to the (tweet|gh comment)"

dakolli • today at 3:46 AM

It is a stateless text / pixel auto-complete it has no references of self, stop spreading this bs.

➕ show 4 replies

alt Hacker News

Replies