You don't need exact gradients, since gradient descent is self-correcting (which can make it hard to find gradient calculation bugs!). One approach using inexact gradients is to use predicted "synthetic gradients" which avoids needing to wait for backward pass for weight updates.