Ok, I've read the paper and now I wonder, why did they stop at the most interesting part? The...

rikimaru0345 • yesterday at 2:40 PM • 3 replies • view on HN

Ok, I've read the paper and now I wonder, why did they stop at the most interesting part?

They did all that work to figure out that learning "base conversion" is the difficult thing for transformers. Great! But then why not take that last remaining step to investigate why that specifically is hard for transformers? And how to modify the transformer architecture so that this becomes less hard / more natural / "intuitive" for the network to learn?

Replies

fcharton • yesterday at 8:56 PM

Author, here. The paper is about the Collatz sequence, how experiments with a transformer can point at interesting facts about a complex mathematical phenomenon, and how, in supervised math transformers, model predictions and errors can be explained (this part is a follow-up to a similar paper about GCD). From a ML research perspective, the interesting (but surprising) take away is the particular way the long Collatz function is learned: "one loop at a time".

To me, the base conversion is a side quest. We just wanted to rule out this explanation for the model behavior. It may be worth further investigation, but it won't be by us. Another (less important) reason is paper length, if you want to submit to peer reviewed outlets, you need to keep pages under a certain number.

➕ show 2 replies

embedding-shape • yesterday at 2:57 PM

Why release one paper when you can release two? Easier to get citations if you spread your efforts, and if you're lucky, someone needs to reference both of them.

A more serious answer might be that it was simply out of scope of what they set out to do, and they didn't want to fall for scope-creep, which is easier said than done.

➕ show 1 reply

fiveMoreCents • yesterday at 7:48 PM

cuz you don't sell nonsense in one piece. it used to be "repeat a lie often enough" ... now lies are split into pieces ...

you'll see more of all that in the next few years.

but if you wanna stay in awe, at your age and further down the road, don't ask questions like you just asked.

be patient and lean into the split.

brains/minds have been FUBARed. all that remains is buying into the fake, all the way down to faking it when your own children get swooped into it all.

"transformers" "know" and "tell" ... and people's favorite cartoon characters will soon run hedge funds but the rest of the world won't get their piece ... this has all gone too far and to shit for no reason.

alt Hacker News

Replies