I'm also puzzled by that statement. The issue with training is (as I understand it) one of prec...

fc417fc802 • today at 4:06 AM • 0 replies • view on HN

I'm also puzzled by that statement. The issue with training is (as I understand it) one of precision and the associated numerical stability. You need enough bits in order for backprop to function correctly.

Of course there are techniques such as quantization aware training but I don't understand why a datatype would work for inference but not for that.

You can also abandon backprop entirely but that comes with a whole host of tradeoffs and again why would it work for inference but not for whatever alternative training regime you selected?

alt Hacker News