seems promising , this is the way , can someone benchmark this
I'm getting 6.55t/s using the Qwen3.5-397B-A17B-4bit model with the command: ./infer --prompt "Explain quantum computing" --tokens 100
MacBook Pro M5 Pro (64GB RAM)
I'm getting 6.55t/s using the Qwen3.5-397B-A17B-4bit model with the command: ./infer --prompt "Explain quantum computing" --tokens 100
MacBook Pro M5 Pro (64GB RAM)