Real time is defined as ‘no slower than some critical speed’, in case of conversation with humans this should be around 10 tok/s including speech synthesis.