Nice, just tried that with "tell me a long tall tale" as the prompt and got:
Speed: 26.41 tok/s
How much with llama.cpp? A 1b model should be a lot faster on a m2
How much with llama.cpp? A 1b model should be a lot faster on a m2