According to the press release, "we achieved an impressive Time-to-First-Token of approximately...

rvnx • last Wednesday at 10:15 AM • 2 replies • view on HN

According to the press release, "we achieved an impressive Time-to-First-Token of approximately 19 seconds for a gemma3:4b model"

Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.

happyopossum • last Wednesday at 7:32 PM

> Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.

For your first request, after having scaled to 0 while it wasn’t in use. For a lot of use cases, that sounds great.

➕ show 1 reply

infecto • last Wednesday at 12:18 PM

Imagine running a production client facing api and not overprovisioning it.

alt Hacker News