logoalt Hacker News

alberto467today at 5:04 PM1 replyview on HN

Not at all. Getting really pedantic, tokenization is also a form of encoding, so it doesn't matter the modality you're using, you'll end up doing some type of encoding in some way.


Replies

altruiostoday at 5:54 PM

Tokens are such a strange base unit. Couldn't we do something that naturally conforms better to reality than such choppy units that cause all sorts of artifacts? making everything 'language based' prevents true multi-modality. Thinking isn't done in language. Thinking outputs language, but its far more like multiple waves of data coalescing into an 'idea', internal... subjectively (n=1) at least. I think wave/signal based transformers are the next jump.

After that a s1/s2 system: fast generation, slow wave correction / observation operating over the fast generation seems like the next leap forward.

Tokens create and hide too many problems to be the 'optimal' solution.

show 1 reply