Highly recommend this as well. Does a great job of helping you build intuition for why things like gradient descent and normalization work. Also gets into the weeds on training dynamics and how to ensure they are behaving properly