I find the benchmarks to be suggestive but not necessarily representative of reality. It's really best if you have your own use case and can benchmark the models yourself. I've found the results to be surprising and not what these public benchmarks would have you believe.