logoalt Hacker News

otterleytoday at 2:55 PM2 repliesview on HN

I checked the fine print on the product website: by “up to 4x faster LLM prompt processing,” they’re specifically referring to time to first token. So it’s not about token generation rate (tokens per second).


Replies

aurareturntoday at 3:59 PM

Yes. This is known. They added neural accelerators, aka Tensor core equivalent, in the GPU. This will make prompt processing competitive vs similar class GPUs.

jasonjmcgheetoday at 3:05 PM

It would probably be worth finding a more friendly way to market this, but it's a reasonable / accurate way to say it.

The prompt processing sped up.

Not the output generation.

M4 was notoriously slow at this compared to DGX etc.