But isn't the prefill speed the bottleneck in some systems* ?
Sure it's order of magnitude faster (10x on Apple Metal?) but there's also order of magnitude more tokens to process, especially for tasks involving summarization of some sort.
But point taken that the parent numbers are probably decode
* Specifically, Mac metal, which is what parent numbers are about
But isn't the prefill speed the bottleneck in some systems* ?
Sure it's order of magnitude faster (10x on Apple Metal?) but there's also order of magnitude more tokens to process, especially for tasks involving summarization of some sort.
But point taken that the parent numbers are probably decode
* Specifically, Mac metal, which is what parent numbers are about