I've had good experiences with the Mistral Voxtral models (I've used the API, but some of the model-variants are open weight)