If it works for predicting the next token in a very long stream of tokens, why not. The question is ...

Tarq0n • today at 7:22 AM • 0 replies • view on HN

If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.

alt Hacker News