logoalt Hacker News

nmfishertoday at 3:51 PM0 repliesview on HN

It's not immediately clear, but this seems to be 250 tok/s on an M4 Max.

For comparison, the current agent swarm challenge on HF is at 508 tok/s on a A10G GPU:

https://huggingface.co/spaces/gemma-challenge/gemma-dashboar...