logoalt Hacker News

dwa3592today at 3:38 PM1 replyview on HN

I assumed that they had to train, otherwise how else would they get "inside" a transformer.

I also feel a bit of bad smell from the article. Sounding revolutionary with no details or clear explanation.


Replies

D-Machinetoday at 7:00 PM

There is no training in the usual sense of the term, i.e. no gradient descent, no differentiable loss function. They use deceptive language early on to make it sound this way, but near the end make it clear their model as is isn't actually differentiable, and in theory might still work if made differentiable. But they don't actually know.

But IMO this is BS because I don't know how one would get or generate training data, or how one would define a continuous loss function that scores partially-correct / plausible outputs (e.g. is a "partially correct" program / algorithm / code even coherent, conceptually).