I think that this is the obvious path to more robust models -- grounding language on video.

alt Hacker News

ilaksh • 01/22/2025 • 0 replies • view on HN