LLMs are, by definition, real time at any speed. 50,000 tokens per second? Real time. Only 0.0002 to...

TheRealPomax • last Wednesday at 4:07 AM • 2 replies • view on HN

LLMs are, by definition, real time at any speed. 50,000 tokens per second? Real time. Only 0.0002 tokens per minute? Still real time.

Eight tokens per second is "real time" in that sense, but that's also the kind of speeds that we used to mock old video games for, when they would show "computers" but the text would slowly get printed to a screen letter for letter or word for word.

Replies

kouteiheika • last Wednesday at 4:48 AM

In this context by "real time" people usually mean "as fast as I can read the reply", so, 0.0002 tokens per minute would not be considered "real time".

➕ show 1 reply

baq • last Wednesday at 8:39 AM

Real time is defined as ‘no slower than some critical speed’, in case of conversation with humans this should be around 10 tok/s including speech synthesis.

alt Hacker News

Replies