logoalt Hacker News

the_afyesterday at 9:40 PM4 repliesview on HN

How do you know you're not reading things that aren't there? LLMs are very good at roleplaying, and they will pick up on hints you may inadvertently be giving them (about them being "tired" and needing "rest", etc).

I have never witnessed this of Claude Opus, by the way. They do get context rot, but that's a relatively better understood phenomenon unrelated to personality.


Replies

nomeltoday at 3:07 AM

> LLMs are very good at roleplaying

Yes, and I think this is where it's coming from. They're role playing as a human programmer, because near 100% of the training text, in the base model, is humans as a programmer. During fine tuning, I'm sure they spend significant resources remove the human aspects of the statistics. I see these things reduced each model, so there's something changing. They're probably getting better at that. I suspect Claude is also necessarily getting, worse, which the unaligned models should necessarily be best at (quick google search in some role-play subreddits seems to point in this direction).