The article states they trained a WASM interpreter and programs are represented as WASM bytecode

gavinray • today at 7:01 PM • 1 reply • view on HN

Replies

Nope, they encoded or compiled in a simple VM / WASM interpreter to the transformer weights, there is no training. You'd be forgiven for this misreading, as they deliberately mislead early on that their model is (in principle) trainable, but later admit that their actual model is not actually differentiable, but that a differentiable approximation "should" still work (despite no info about what loss function or training data could allow scoring partially correct / incomplete program outputs).

alt Hacker News

Replies