Why can't you just leave H_res as the identity matrix (or just not use it at all)? In that case...

in-silico • yesterday at 7:16 PM • 0 replies • view on HN

Why can't you just leave H_res as the identity matrix (or just not use it at all)? In that case, the model is basically a ResNet again and you don't need to worry about exploding/vanishing gradients from H_res.

I would think that H_post and H_pre could cover the lost expressiveness.

alt Hacker News