logoalt Hacker News

SPascareli13today at 4:27 PM0 repliesview on HN

If the model is trained to be a interpreter, then that means that the loss should reach 0 for it to be fully trained?

Also, if it's execution is purely deterministic, you probably don't need non linearity in the layers, right?