logoalt Hacker News

azinman2today at 4:09 PM2 repliesview on HN

Seems very reasonable to me


Replies

tux3today at 4:24 PM

A bit strange to use time to first token instead of throughput.

Latency to the first token is not like a web page where first paint already has useful things to show. The first token is "The ", and you'll be very happy it's there in 50ms instead of 200ms... but then what you really want to know is how quickly you'll get the rest of the sentence (throughput)

show 5 replies
nabakintoday at 4:49 PM

I would consider it reasonable if this was 4x TTFT and Throughput, but it seems like it's only for TTFT.