logoalt Hacker News

SlinkyOnStairsyesterday at 7:35 PM1 replyview on HN

Realistically, we are.

This is not some arbitrary design choice, it's the core compromise to make LLMs viable to train at all.


Replies

airstrikeyesterday at 8:25 PM

Define "realistically". You're basically saying attention is all we need indefinitely into the future and all other gains come from more compute or scaffolding around current architectures.

Attention is all we need because it is currently the best parallelizable way to model long-range dependencies on current hardware constraints, not because flat tokens yield some natural law of intelligence inherently.

Who's to say we won't find a way to encode provenance or privilege natively into models such that the tradeoff changes?

It's hard to say what the solution will be. If I knew it, I'd build it. But it's even harder to sustain that the current architecture is a crystalized global optimum.

show 2 replies