logoalt Hacker News

nielsoleyesterday at 7:58 AM2 repliesview on HN

Isn't that the opposite of the bitter lesson - adding more cleverness to the architecture?


Replies

fennecbuttyesterday at 5:21 PM

I guess it's just working with the brain model (so to speak) than against it.

Inthesamewaythatweusepunctuation. Or even that we usually order words a certain way, oranges and apples, Ted and Bill, roundabouts and swings.

cschmidtyesterday at 11:37 AM

I suppose it is. There is a lot to tokenization - pre-tokenization, how to handle digits, the tokenization training approach - that is about adding cleverness. In the long run, the bitter lesson would be to just get rid of it all and learn from more data. Many people would love to do it. But I think for the case of BLT, digits will still be an issue. There is no way an autoregressive entropy model will be able to split numbers sensibly, since it has no idea how many digits are coming. It seems like it will struggle more with arithmetic. Perhaps you could reverse all the digits in a number, then it has a chance. So 12334 becomes 43321, and it gets to start from the ones digit. This has been suggested as an approach for LLM's.

show 2 replies