depends on format, compute type, quantization and kv cache size.
Specs for whatever they used to achieve the benchmarks would be a good start.
Specs for whatever they used to achieve the benchmarks would be a good start.