logoalt Hacker News

bobbylarrybobbylast Sunday at 7:05 PM1 replyview on HN

My guess is that at LLM scale, you really can't try to hyperparameter tune — it's just too expensive. You probably have to do some basic testing of different architectures, settle on one, and then figure out how to make best use of it (data and RL).


Replies