And yet LLMs still fail on simple questions of logic like ‘should I take the car to the car wash or walk?’
Generative AI is not making judgements or reasoning here, it is reproducing the most likely conclusions from its training data. I guess that might be useful for something but it is not judgement or reasoning.
What consideration was given to the original experiment and others like it being in the training set data?