To the defense of the models, the experiment was run with temperature set at 0.01 which is very low; setting this can lead to weird responses. My find-on-page also found no mention of “thinking” or “reasoning” in the paper. Not trying to discount the whole thing but very curious how changing the parameters might affect results