In practice the 4bit MLX version runs at 20t/s for general chat. Do you consider that too slow for practical use?
What example tasks would you try?