logoalt Hacker News

D-Machinetoday at 3:46 PM1 replyview on HN

Did you read the post you are responding to? It says:

> What's the benefit? Is it speed? Where are the benchmarks? Is it that you can backprop through this computation? Do you do so?

The correct parsing of this is: "What's the benefit? [...] Is it [the benefit] that you can backprop through this computation? Do you do so?"

There are no details about training nor the (almost-certainly necessarily novel) loss function that would be needed to handle partial / imperfect outputs here, so it is extremely hard to believe any kind of gradient-based training procedure was used to determine / set weight values here.


Replies

radarsat1today at 6:01 PM

> There are no details about training

my understanding was that they are not training at all, which would explain that. they are compiling an interpreter down to a VM that has the shape of a transformer.

ie they are calculating the transformer weights needed to execute the operations of the machine they are generating code for.

show 1 reply