> run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this What kind of hard...

chrsw • yesterday at 6:08 PM • 3 replies • view on HN

> run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this

What kind of hardware do you have to be able to run a performant GPT-OSS-120b locally?

Replies

embedding-shape • yesterday at 7:11 PM

RTX Pro 6000, ends up taking ~66GB when running the MXFP4 native quant with llama-server/llama.cpp and max context, as an example. Guess you could do it with two 5090s with slightly less context, or different software aimed at memory usage efficiency.

➕ show 1 reply

fgonzag • yesterday at 7:04 PM

The model is 64GB (int4 native), add 20GB or so for context.

There are many platforms out there that can run it decently.

AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc.

FuckButtons • yesterday at 9:01 PM

Mbp 128gb.

alt Hacker News

Replies