groq was targeting a part of the stack where cuda was weakest: guaranteed inference time at a lower ...

monster_truck • yesterday at 10:56 PM • 0 replies • view on HN

groq was targeting a part of the stack where cuda was weakest: guaranteed inference time at a lower cost per token at scale. This was in response to more than just goog's tpus, they were also one of the few realistic alternative paths oai had with those wafers.

alt Hacker News