what tokens/s are you getting with a 122B MoE model in this setup? I didn't see any benchmarks in the benchmarks section on the readme.md
I'll add more details. We just wired up the pipeline on both MAC and IOS.
yeah this I'd like to see added to teh readme.
https://www.sharpai.org/benchmark/ The MLX part is what we've done with SwiftLM, the local result is still being verified more details are on-going.