logoalt Hacker News

Scene_Cast2last Sunday at 6:05 PM2 repliesview on HN

I find it interesting that the architectures of modern open weight LLMs are so similar, and that most innovation seems to be happening on the training (data, RL) front.

This is contrary to what I've seen in a large ML shop, where architectural tuning was king.


Replies

bobbylarrybobbylast Sunday at 7:05 PM

My guess is that at LLM scale, you really can't try to hyperparameter tune — it's just too expensive. You probably have to do some basic testing of different architectures, settle on one, and then figure out how to make best use of it (data and RL).

show 1 reply
ModelForgelast Sunday at 7:04 PM

Good point. LLMs lower the barrier to entry if someone has enough resources because those architectures are more robust to tweaks given one throws enough compute and data at them. You can even violate scaling laws and still get a good model (like Llama 3 showed back then)