LLMs are, by definition, real time at any speed. 50,000 tokens per second? Real time. Only 0.0002 tokens per minute? Still real time.
Eight tokens per second is "real time" in that sense, but that's also the kind of speeds that we used to mock old video games for, when they would show "computers" but the text would slowly get printed to a screen letter for letter or word for word.
Real time is defined as ‘no slower than some critical speed’, in case of conversation with humans this should be around 10 tok/s including speech synthesis.
In this context by "real time" people usually mean "as fast as I can read the reply", so, 0.0002 tokens per minute would not be considered "real time".