logoalt Hacker News

baalimagotoday at 5:28 AM2 repliesview on HN

Train your LM from scratch*

I doubt you have a machine big enough to make it "Large".


Replies

mips_avatartoday at 6:06 AM

You can fully train a 1.6b model on a single 3090. That’s a reasonably big model.

show 1 reply
nucleardogtoday at 5:52 AM

Hey now! I've got a half terabyte of RAM at my disposal! I mean, it's DDR4 but... it's RAM!

And it's paired with 48 processor cores! I mean, they don't even support AVX512 but they can do math!

I could totally train a LLM! Or at least my family could... might need my kid to pick up and carry on the project.

But in all seriousness... you either missed the point, are being needlessly pedantic, or are... wrong?

This is about learning concepts, and the rest of this is mostly moot.

On the pedantic or wrong notes--What is the documented cut-off for a "large" language model? Because GPT-2 was and is described as a "large" language model. It had 1.5B parameters. You can just about get a consumer GPU capable of training that for about $400 these days.

show 2 replies