I'm reading this post and wondering what kind of crazy accessibility tools one could make. I think it's a little off the rails but imagine a tool that describes a web video for a blind user as it happens, not just the speech, but the actual action.
This is not local but Gemini models can process very long videos and provide description with timestamps if asked for.
https://ai.google.dev/gemini-api/docs/video-understanding#tr...
This is not local but Gemini models can process very long videos and provide description with timestamps if asked for.
https://ai.google.dev/gemini-api/docs/video-understanding#tr...