logoalt Hacker News

ilaksh01/22/20250 repliesview on HN

I think that this is the obvious path to more robust models -- grounding language on video.