logoalt Hacker News

08234987234987210/12/20241 replyview on HN

As the origin is special, instead of training in a linear space, what would training in an affine space do?


Replies

whatshisface10/12/2024

I think they are training in an affine space, but I see what you're saying. The initialization of the bias must be breaking the symmetry in a way that makes the origin special. Of course to some degree that's unavoidable since we have to initialize on distributions with compact support.