LLMs seem to me closer to Kahneman's System 1 than to System 2. When understood in this way, it is obvious why LLMs are bad at counting r's in "strawberries". But it also makes ZEH feel like it couldn't possibly be a useful metric, because it's a System 2 evaluation applied to a System 1 system.
A big part of skill aquisition in humans is moving tasks from system 2 to system 1, to free up the very scarce thinking resources for ever more complex tasks, that can then in turn be internalized and handled by system 1.
> When understood in this way, it is obvious why LLMs are bad at counting r's in "strawberries".
no it doesnt. it makes sense that they cant count the rs because they dont have access to the actual word, only tokens that might represent parts or the whole of the word