logoalt Hacker News

nayrocladetoday at 2:54 AM0 repliesview on HN

Is the approach fundamentally limited to smaller models? Or could you theoretically train a model as powerful as the largest models, but much faster?