Yep. But this is like 10x faster; 3B active parameters.
Cerebras is already 200-800 tps, do you need even faster ?
Cerebras is already 200-800 tps, do you need even faster ?