GPU compute units are not that simple, the main difference with CPU is that they generally use a combination of wide SIMD and wide SMT to hide latency, as opposed to the power-intensive out-of-order processing used by CPU's. Performing tasks that can't take advantage of either SIMD or SMT on GPU compute units might be a bit wasteful.
Also you'd need to add extra hardware for various OS support functions (privilege levels, address space translation/MMU) that are currently missing from the GPU. But the idea is otherwise sound, you can think of the 'Mill' proposed CPU architecture as one variety of it.
> GPU compute units are not that simple
Perhaps I should have phrased it differently. CPU and GPU cores are designed for different types of loads. The rest of your comment seems similar to what I was imagining.
Still, I don't think that enhancing the GPU cores with CPU capabilities (OOE, rings, MMU, etc from your examples) is the best idea. You may end up with the advantages of neither and the disadvantages of both. I was suggesting that you could instead have a few dedicated CPU cores distributed among the numerous GPU cores. Finding the right balance of GPU to CPU cores may be the key to achieving the best performance on such a system.