Isn’t this in a sense an RNN built out of a slice of an LLM? Which if true means it might have the s...

omneity • yesterday at 11:50 PM • 1 reply • view on HN

Isn’t this in a sense an RNN built out of a slice of an LLM? Which if true means it might have the same drawbacks, namely slowness to train but also benefits such as an endless context window (in theory)

Replies

ctoa • today at 1:30 AM

It's sort of an RNN, but it's also basically a transformer with shared layer weights. Each step is equivalent to one transformer layer, the computation for n steps is the same as the computation for a transformer with n layers.

The notion of context window applies to the sequence, it doesn't really affect that, each iteration sees and attends over the whole sequence.

➕ show 1 reply

alt Hacker News

Replies