logoalt Hacker News

joe_the_usertoday at 3:42 AM0 repliesview on HN

I think both the literature on interpretability and explorations on internal representations actually reinforce the author's conclusion. I think internal representation research tends to nets that deal with a single "model" don't necessary have the same representation and don't necessarily have a single representation.

And doing well on XYZ isn't evidence of a world model in particular. The point that these things aren't always using a world is reinforced by systems being easily confused by extraneous information, even systems as sophisticated as thus that can solve Math Olympiad questions. The literature has said "ad-hoc predictors" for a long time and I don't think much has changed - except things do better on benchmarks.

And, humans too can act without a consistent world model.