I had this suspicion for a while I think we just got way better in harnessing not the models actual reasoning
So we got better in giving it the right context and tools to do the stuff we need to do but not the actual thinking improvements