logoalt Hacker News

int_19hlast Sunday at 11:13 PM0 repliesview on HN

I'm pretty sure that any world model that is inherently incapable of "bad outputs" would be too castrated in general to the point where it'd be actively detrimental to overall model quality. Even as it is, with RLHF "alignment", we already know that it has a noticeable downwards effect on raw scores.