Their models are organized around inference efficiency from the start, it's what they're f...

orbital-decay • today at 5:52 AM • 1 reply • view on HN

Their models are organized around inference efficiency from the start, it's what they're focusing on. Also they come from HFT and are good at low-level optimization. For v3, they've been literally reverse engineering Nvidia GPUs for undocumented behavior that helped against memory bottlenecks, writing file systems for efficient model serving, and doing a ton of low-level grunt work in the times where everyone else just relied on torch. Being compute-constrained helped as well - necessity is the mother of invention.

Replies

pingou • today at 6:47 AM

But what is preventing their competitors, who have many more employees, who are also very talented, to do the same?

Every little improvement would save them billions, so it's hard to imagine they aren't pouring a lot of resources into that already.

➕ show 1 reply

alt Hacker News

Replies