Yes, you are way off, because Groq doesn't make open source models. Groq makes innovative AI accelerator chips that are significantly faster than Nvidia's.
For inference, but yes. Many hundreds of tokens per second of output is the norm, in my experience. I don't recall the prompt processing figures but I think it was somewhere in the low hundreds of tokens per second (so slightly slower than inference).
> Groq makes innovative AI accelerator chips that are significantly faster than Nvidia's.
Yeah I'm disappointed by this, this is clearly to move them out of the market. Still, that leaves a vacuum for someone else to fill. I was extremely impressed by Groq last I messed about with it, the inference speed was bonkers.