logoalt Hacker News

int_19h01/20/20250 repliesview on HN

Several Chinese models already go up to 128k so it's not like they don't know how to scale it up, but models that handle long context well also take more time and compute to train, so it makes sense that they're iterating on quality of outputs rather than increasing length right now. I wouldn't read much into it wrt moats or lack thereof.