I find it interesting that the architectures of modern open weight LLMs are so similar, and that mos...

Scene_Cast2 • last Sunday at 6:05 PM • 2 replies • view on HN

I find it interesting that the architectures of modern open weight LLMs are so similar, and that most innovation seems to be happening on the training (data, RL) front.

This is contrary to what I've seen in a large ML shop, where architectural tuning was king.

Replies

bobbylarrybobby • last Sunday at 7:05 PM

My guess is that at LLM scale, you really can't try to hyperparameter tune — it's just too expensive. You probably have to do some basic testing of different architectures, settle on one, and then figure out how to make best use of it (data and RL).

➕ show 1 reply

ModelForge • last Sunday at 7:04 PM

Good point. LLMs lower the barrier to entry if someone has enough resources because those architectures are more robust to tweaks given one throws enough compute and data at them. You can even violate scaling laws and still get a good model (like Llama 3 showed back then)

alt Hacker News

Replies