logoalt Hacker News

gavinraytoday at 7:01 PM1 replyview on HN

The article states they trained a WASM interpreter and programs are represented as WASM bytecode


Replies

D-Machinetoday at 7:33 PM

Nope, they encoded or compiled in a simple VM / WASM interpreter to the transformer weights, there is no training. You'd be forgiven for this misreading, as they deliberately mislead early on that their model is (in principle) trainable, but later admit that their actual model is not actually differentiable, but that a differentiable approximation "should" still work (despite no info about what loss function or training data could allow scoring partially correct / incomplete program outputs).