(nb. I probably should have said sparse feature tracking and not optical flow. People tend to get the wrong idea about what optical flow fundamentally requires. Spatial regularity and density are not inherent but people may assume they need to be.)
First of all, did you watch the video? (the whole thing is kinda annoying and long, but the part in question here is only about 3 seconds so it's worth looking) Two points about the video: 1) The positioning of the overlay is noticeably unstable in relation to the apparent camera motion, so it doesn't even show what the OP claims it does. 2) You don't have any way to know what the latency is because of that.
Anyway, yes even with, and even in the form factor if you optimize for the right things. The kind of simple feature tracking that can accomplish what's shown in the video was real-time in like 2005, and there have been significant hardware and algorithmic advancements in the past 20 years.