Wouldn't you deal with spatial reasoning by giving it access to a tool that structures the space in a way it can understand or just is a sub-model that can do spatial reasoning? These "general" models would serve as the frontal cortex while other models do specialized work. What is missing?
They should train more on sports commentary, perhaps that could give spatial reasoning a boost.
That's a bit like saying just give blind people cameras so they can see.