My guess is that at LLM scale, you really can't try to hyperparameter tune — it's just too...

bobbylarrybobby • last Sunday at 7:05 PM • 1 reply • view on HN

My guess is that at LLM scale, you really can't try to hyperparameter tune — it's just too expensive. You probably have to do some basic testing of different architectures, settle on one, and then figure out how to make best use of it (data and RL).

alt Hacker News

Replies