Do you actually understand why evolution methods are beneficial?
SGD generates a stronger learning signal,is more efficient, and scales better. Using it end-to-end makes it stronger yet.
Yet somehow mixing in a weaker blunt evolution stage improves the result?