Interesting that he came to this conclusion (CoT should be done in latent space) well before the release of OpenAI's o1, which made explicit CoT reliable in the first place. At the time the blog post was written, CoT was only achieved via a "reason step by step" instruction, which was highly error prone compared to modern o1-like reasoning. (And before InstructGPT/ChatGPT, it was achieved by prompting the model with "let me reason step by step".)
Fair to say he demonstrates good CoT reasoning capabilities?