I think they are training in an affine space, but I see what you're saying. The initialization ...

whatshisface • 10/12/2024 • 0 replies • view on HN

I think they are training in an affine space, but I see what you're saying. The initialization of the bias must be breaking the symmetry in a way that makes the origin special. Of course to some degree that's unavoidable since we have to initialize on distributions with compact support.

alt Hacker News