Aren't the LLM-based features of this announcement catch-up features? Describing the contents of the screen is something Gemini has been doing on Pixel phones for a while. It's a fairly obvious use case for a multimodal AI.
My one hope is that this eventually becomes widespread enough to stop alt text scolds.