logoalt Hacker News

chrswyesterday at 6:08 PM3 repliesview on HN

> run locally for agentic coding. Nowadays I mostly use GPT-OSS-120b for this

What kind of hardware do you have to be able to run a performant GPT-OSS-120b locally?


Replies

embedding-shapeyesterday at 7:11 PM

RTX Pro 6000, ends up taking ~66GB when running the MXFP4 native quant with llama-server/llama.cpp and max context, as an example. Guess you could do it with two 5090s with slightly less context, or different software aimed at memory usage efficiency.

show 1 reply
fgonzagyesterday at 7:04 PM

The model is 64GB (int4 native), add 20GB or so for context.

There are many platforms out there that can run it decently.

AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc.

FuckButtonsyesterday at 9:01 PM

Mbp 128gb.