logoalt Hacker News

monster_truckyesterday at 10:56 PM0 repliesview on HN

groq was targeting a part of the stack where cuda was weakest: guaranteed inference time at a lower cost per token at scale. This was in response to more than just goog's tpus, they were also one of the few realistic alternative paths oai had with those wafers.