square roots are expensive
they are negligible, especially when the post was written when ops were not fused. The extra memory you need to store the extra tensors when you use the original version is more expensive
they are negligible, especially when the post was written when ops were not fused. The extra memory you need to store the extra tensors when you use the original version is more expensive