Are off-shelf GPUs (like one 3090) suitable for modern academic research on current AI advancements or is it better to rent some cloud compute?
Those cards can be great for lots of use cases, plenty of small models are very capable at the param counts which can fit in 32GB of VRAM. GPT-OSS-20B for example is a serviceable model for agentic coding use cases and it runs natively in MXFP4. So it fits comfortably on a 5090 at full 128k context. It also has enough headroom to do PEFT-style SFT or RL.
But given the high entry cost and depending on the cost of electricity in your area, it would take a number of years to amortize both the initial purchase of the card in addition to the energy cost of the compute (comparing to the compute-equivalent hourly cloud rental costs).
For context, a single 5090 rented via Runpod is currently $0.69/hr USD on-demand. Cost range on Amazon right now for a new card is running between $3200-3700 USD. Just using the raw capex alone, that's ~5k hours of GPU compute assuming you pay only on-demand. Thats 2-3 years worth of compute if you assume compute saturation for normal working hour durations. This is before you account for the cost of power, which in my city could run you upwards of $140/mo varying by season.
With that said, I have a bunch of ML servers that I built for myself. The largest one is using 2x RTX Pro 6000s and have been very happy with it. If I was only doing inference I think this would be a somewhat questionable expense, setting aside the valid motivations that some folks have related to data privacy and security. But I do a lot of finetuning and maintain private/local eval harnesses that personally for me have made it worth the investment.
Research runs on a variety of scales - but "check if this new idea/method/architecture isn't completely dumb on small scale before trying to scale up" is a common enough pattern. And most of those fail on small scale.
It depends on what you want to do in this gigantic field.
it is good for quick testing of stuff, but absolutely it is better to rent some cloud compute - HN skews a bit fantastical/fanatical on this issue
It's good to have a local GPU. That's like your dev environment. Prod is much more expensive in AI programming than in web programming. So you want to make sure everything is working before you push!
If you're seriously doing deep learning research, it's very very nice to own your own GPU.
For four years of AI PhD research I worked with a 1050Ti on a personal laptop and a 2060 on a personal desktop. You can do a lot of validation and development on consumer GPUs.
That said, the OP does not train an LLM from scratch on a 3090. That would not be feasible
Absolutely. Your model selection has limits of course: best practice for some types of replicable research would be to to use unquantized models, but that still leaves room for smaller Gemma and Llama models.
I’m on a 4080 for a lot of work and it gets well over 50 tokens per second on inference for pretty much anything that fits in VRAM. It’s comparable to a 3090 in compute, the 3090 has 50% more vram, the 4080 has better chip-level support for certain primitives, but that actually matters slightly less using unquantized models, making the 3090 a great choice. The 4080 is better if you want more throuput on inference and use certain common quantize levels.
Training LoRa and fine tunes is highly doable. Yesterday’s project for me, as an example, was training trigger functionality into a single token unused in the vocabulary. Under 100 training examples in the data set, 10 to 50 epochs, extremely usable “magic token” results in under a few minutes at most. This is just an example.
If you look at the wealth of daily entries on arxiv in cs.ai many are using established smaller models with understood characteristics, which makes it easier to understand the result of anything you might do both in your research and in others’ being able to put your results in context.