Did they? Deepseek spent about 17 months achieving SOTA results with a significantly smaller budget....

smy20011 • 02/20/2025 • 10 replies • view on HN

Did they? Deepseek spent about 17 months achieving SOTA results with a significantly smaller budget. While xAI's model isn't a substantial leap beyond Deepseek R1, it utilizes 100 times more compute.

If you had $3 billion, xAI would choose to invest $2.5 billion in GPUs and $0.5 billion in talent. Deepseek, would invest $1 billion in GPUs and $2 billion in talent.

I would argue that the latter approach (Deepseek's) is more scalable. It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible.

Replies

mike_hearn • 02/20/2025

We don't actually know how much money DeepSeek spent or how much compute they used. The numbers being thrown around are suspect, the paper they published didn't reveal the costs of all models nor the R&D cost it took to develop them.

In any AI R&D operation the bulk of the compute goes on doing experiments, not on the final training run for whatever models they choose to make available.

➕ show 2 replies

sigmoid10 • 02/20/2025

>It's extremely difficult to increase compute by 100 times, but with sufficient investment in talent, achieving a 10x increase in compute is more feasible.

The article explains how in reality the opposite is true. Especially when you look at it long term. Compute power grows exponentially, humans do not.

➕ show 4 replies

stpedgwdgfhgdd • 02/20/2025

Large amounts of teams are very hard to scale.

There is a reason why startups innovate and large companies follow.

mirekrusin • 02/20/2025

Deepseek innovation is applicable to xAI setup - results are simply multiply of their compute scale.

Deepseek didn’t have option A or B available, they only had extreme optimisation option to work with.

It’s weird that people present those two approaches as mutually exclusive ones.

PeterStuer • 02/20/2025

It's not an either/or. Your hiring of talent is only limited by your GPU spend if you can't hire because you ran out of money.

In reality pushing the frontier on datacenters will tend to attract the best talent, not turn them away.

And in talent, it is the quality rather than the quantity that counts.

A 10x breakthrough in algorithm will compound with a 10x scaleout in compute, not hinder it.

I am a big fan of Deepseek, Meta and other open model groups. I also admire what the Grok team is doing, especially their astounding execution velocity.

And it seems like Grok 2 is scheduled to be opened as promised.

➕ show 2 replies

SamPatt • 02/20/2025

R1 came out when Grok 3's training was still ongoing. They shared their techniques freely, so you would expect the next round of models to incorporate as many of those techniques as possible. The bump you would get from the extra compute occurs in the next cycle.

If Musk really can get 1 million GPUs and they incorporate some algorithmic improvements, it'll be exciting to see what comes out.

dogma1138 • 02/20/2025

Deepseek didn’t seem to invest in talent as much as it did in smuggling restricted GPUs into China via 3rd countries.

Also not for nothing scaling compute x100 or even x1000 is much easier than scaling talent by x10 or even x2 since you don’t need workers you need discovery.

➕ show 1 reply

wordofx • 02/20/2025

Deepseek was a crypto mining operation before they pivoted to AI. They have an insane amount of GPUs laying around. So we have no idea how much compute they have compared to xAI.

➕ show 2 replies

oskarkk • 02/20/2025

> While xAI's model isn't a substantial leap beyond Deepseek R1, it utilizes 100 times more compute.

I'm not sure if it's close to 100x more. xAI had 100K Nvidia H100s, while this is what SemiAnalysis writes about DeepSeek:

> We believe they have access to around 50,000 Hopper GPUs, which is not the same as 50,000 H100, as some have claimed. There are different variations of the H100 that Nvidia made in compliance to different regulations (H800, H20), with only the H20 being currently available to Chinese model providers today. Note that H800s have the same computational power as H100s, but lower network bandwidth.

> We believe DeepSeek has access to around 10,000 of these H800s and about 10,000 H100s. Furthermore they have orders for many more H20’s, with Nvidia having produced over 1 million of the China specific GPU in the last 9 months. These GPUs are shared between High-Flyer and DeepSeek and geographically distributed to an extent. They are used for trading, inference, training, and research. For more specific detailed analysis, please refer to our Accelerator Model.

> Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters. Similarly, all AI Labs and Hyperscalers have many more GPUs for various tasks including research and training then they they commit to an individual training run due to centralization of resources being a challenge. X.AI is unique as an AI lab with all their GPUs in 1 location.

https://semianalysis.com/2025/01/31/deepseek-debates/

I don't know how much slower are these GPUs that they have, but if they have 50K of them, that doesn't sound like 100x less compute to me. Also, a company that has N GPUs and trains AI on them for 2 months can achieve the same results as a company that has 2N GPUs and trains for 1 month. So DeepSeek could spend a longer time training to offset the fact that have less GPUs than competitors.

➕ show 1 reply

_giorgio_ • 02/21/2025

Deepseek spent at least 1.5 billion on hardware.

alt Hacker News

Replies