Football coaches and players, especially in Europe games, were used to speak freely during gameplays but in recent years they were forced to mask their lips/mouths otherwise, specialists will decipher their conversations and reveal their intentions in the game.
I just realized the government probably has a lip reading AI model trained. Training one would be super easy. Download youtube videos with uploader-provided captions, cut to just scenes where only a single face is detected, and then use the lip points and facial landmarks and subtitle text (which has word-level timings) as training data. Then you can point a camera at anyone from a distance and know what they are saying. The longer they talk, the more accurate the output will be, as additional context is provided.
What a delightful find, pairs well with the post about designing for a blind client[0].
Thank you for posting <3
I guess they’ve never heard of the Bad Lip Reading YT channel
I am Deaf. I lipread German, but rarely lipread English and was successful with lipreading people speaking English as not their native language, namely Greek, Estonian and Marathi but never from the Anglophone world. Turning older, my lipreading competence declines, and I prefer talking in my local Signed Language.
What this article is missing, I think, is that lipreading is not replacing auditory input with visual input because there are too many, let's just call them homophones. I find the word "visemes" a bit too cute. You need a lot of context. So I always struggle if someone asks me a very short question, because I don't know what it's about. Someone comes to me, says hello, and asks the question, and I don't have an idea what they want to know.
Re gameplays, the context is special, because you can assume that the football coach is not shouting "HAKUNA MATATA" over the field. This simplifies lipreading during gameplays. Essentially it devolves into something like a set of radio buttons.