I checked the fine print on the product website: by “up to 4x faster LLM prompt processing,” they’re specifically referring to time to first token. So it’s not about token generation rate (tokens per second).
It would probably be worth finding a more friendly way to market this, but it's a reasonable / accurate way to say it.
The prompt processing sped up.
Not the output generation.
M4 was notoriously slow at this compared to DGX etc.
Yes. This is known. They added neural accelerators, aka Tensor core equivalent, in the GPU. This will make prompt processing competitive vs similar class GPUs.