> As the hardware continues to iterate at a rapid pace, anything you pick up second-hand will still deprecate at that pace, making any real investment in hardware unjustifiable.
Can you explain your rationale? It seems that the worst case scenario is that your setup might not be the most performant ever, but it will still work and run models just as it always did.
This sounds like a classical and very basic opex vs capex tradeoff analysis, and these are renowned for showing that on financial terms cloud providers are a preferable option only in a very specific corner case: short-term investment to jump-start infrastructure when you do not know your scaling needs. This is not the case for LLMs.
OP seems to have invested around $600. This is around 3 months worth of an equivalent EC2 instance. Knowing this, can you support your rationale with numbers?
When considering used hardware you have to take quantization into account; gpt-oss-120b for example is running a very new MXFP4 which will use far more than 80GB to fit into the available fp types on older hardware or Apple silicon.
Open models are trained on modern hardware and will continue to take advantage of cutting edge numeric types, and older hardware will continue to suffer worse performance and larger memory requirements.