Not to mention, if it's an ML workload, you'll also have to factor in downloading the weights and loading them into memory, which can double that time or more.
According to the press release, "we achieved an impressive Time-to-First-Token of approximately 19 seconds for a gemma3:4b model"
Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.
According to the press release, "we achieved an impressive Time-to-First-Token of approximately 19 seconds for a gemma3:4b model"
Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.