logoalt Hacker News

mft_today at 11:46 AM2 repliesview on HN

> So how much internal memory does the latest Cerebras chip have? 44GB. This puts OpenAI in kind of an awkward position. 44GB is enough to fit a small model (~20B params at fp16, ~40B params at int8 quantization), but clearly not enough to fit GPT-5.3-Codex. That’s why they’re offering a brand new model, and why the Spark model has a bit of “small model smell” to it: it’s a smaller distil of the much larger GPT-5.3-Codex model.

This doesn't make sense.

1. Nvidia already sells e.g. the H100 with 80GB memory, so having 44GB isn't an advance, let alone a differentiator.

2. As I suspect anyone that's played with open weights models will attest, there's no way that 5.3-Codex-Spark is getting close to top-level performance and being sold in this way while being <44GB. Yes it's weaker and for sure it's probably a distil and smaller, but not by ~two orders of magnitude as suggested.


Replies

EdNuttingtoday at 12:03 PM

You’re mixing up HBM and SRAM - which is an understandable confusion.

NVIDIA chips use HBM (High Bandwidth Memory) which is a form of DRAM - each bit is stored using a capacitor that has to be read and refreshed.

Most chips have caches on them built out of SRAM - a feedback loop of transistors that store each bit.

The big differences are in access time, power and density: SRAM is ~100 times faster than DRAM but DRAM uses much less power per gigabyte, and DRAM chips are much smaller per gigabyte of stored data.

Most processors have a few MB of SRAM as caches. Cerebras is kind of insane in that they’ve built one massive wafer-scale chip with a comparative ocean of SRAM (44GB).

In theory that gives them a big performance advantage over HBM-based chips.

As with any chip design though, it really isn’t that simple.

show 2 replies
aurareturntoday at 11:58 AM

It does make sense. Nvidia chips do not promise 1,000+ tokens/s. The 80GB is external HBM, unlike Cerebras’ 44GB internal SRAM.

The whole reason Cerebras can inference a model thousands of tokens per second is because it hosts the entire model in SRAM.

There are two possible scenarios for Codex Spark:

1. OpenAI designed a model to fit exactly 44GB.

2. OpenAI designed a model that require Cerebras to chain multiple wafer chips together; IE, an 88GB or 132GB or 176GB model or more.

Both options require the entire model to fit inside SRAM.

show 1 reply