Are models any good at descerning motion from multiple frames?
For instance if I gave models multiple animations of a bouncing ball as individual frames. Would they be able to tell which bounce was the more realistic motion.
(Is this a potential new benchmark? maybe also variations of stair dismount)
I’d imagine they could. I’d try Gemini 3.5 flash with high fps.