I checked the fine print on the product website: by “up to 4x faster LLM prompt processing,” they’re...

otterley • today at 2:55 PM • 2 replies • view on HN

I checked the fine print on the product website: by “up to 4x faster LLM prompt processing,” they’re specifically referring to time to first token. So it’s not about token generation rate (tokens per second).

Replies

aurareturn • today at 3:59 PM

Yes. This is known. They added neural accelerators, aka Tensor core equivalent, in the GPU. This will make prompt processing competitive vs similar class GPUs.

jasonjmcghee • today at 3:05 PM

It would probably be worth finding a more friendly way to market this, but it's a reasonable / accurate way to say it.

The prompt processing sped up.

Not the output generation.

M4 was notoriously slow at this compared to DGX etc.

alt Hacker News

Replies