logoalt Hacker News

dwa3592today at 2:27 PM1 replyview on HN

if you understood the article, please correct my understanding -

they created a new training dataset which also has computation solving step by step (multiplying two numbers or playing sudoku) and then trained a transformer on it- as a result, the model performs the computation(multiplying two numbers) "inside" itself instead of calling calculator (or python)?

++ And they also figured out how to make attention faster?


Replies

YeGoblynQueennetoday at 2:57 PM

I can't see anything about "training a transformer". I'm trying to understand if e.g. the Sudoku solver was learned from examples (in which case, what examples?) or whether it was manually coded and then "compiled" into weights.

show 2 replies