Taalas is interesting. 16,000 TPS for Llama on a chip.
https://taalas.com/
On a very old model, it's more like 16.000 garbage words/s
Its exciting to see, but look at the die size for only an 8b model
I wonder how many token per seconds can they get if they put Mercury 2 on a chip.
On a very old model, it's more like 16.000 garbage words/s