The vision language action models and the two level slow planning and fast control LLMs seem to be a big breakthrough.