One of the comments on IG explains this perfectly:
"Meta has been doing this; when they auto-translate the audio of a video they are also adding an Al filter to make the mouth of who is speaking match the audio more closely. But doing this can also add a weird filter over all the face."
I don't know why you have to get into conspiracy theories about them applying different filters based on the video content, that would be such a weird micro optimization why would they bother with that
Some random team or engineer does it to get a promo.
I doubt that’s what’s happening too but it’s not beyond the pale. They could be feeding both the input video and audio/transcript into their transformer and it has learned “when the audio is talking about lips the person is usually puckering their lips for the camera” so it regurgitates that.