It's not immediately clear, but this seems to be 250 tok/s on an M4 Max.
For comparison, the current agent swarm challenge on HF is at 508 tok/s on a A10G GPU:
https://huggingface.co/spaces/gemma-challenge/gemma-dashboar...
More of a meta comment, but I really wish anthropic would say something about their plans for Fable. We're all just kind of left here floating and aimless, with no idea of what to expect
That's very impressive. What's the best way to run these kernels natively on a Mac? I saw that there's a way to plug Claude into Apple's Foundation Models framework, and there's a CLI tool that can access models via that framework. It might be useful to have something so fast and good available via a small CLI tool for various purposes, especially when connected with a small suite of tools I have for things like file editing, showing, simple agentic purposes etc.