logoalt Hacker News

Tarq0ntoday at 7:22 AM0 repliesview on HN

If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.