They are starting to be smarter at both analyzing images and speech as well. They’re still behind on simple reasoning (eg. O1-preview), but it’s catching up quickly.
Obviously these models still have trouble interfacing with the real world.