logoalt Hacker News

mxwsntoday at 4:25 PM0 repliesview on HN

No, there are more training tokens than parameters in LLMs. They are in the classical first descent setting.