Turns out the best GPU optimization is just being too scared of graphics drivers to do the fancy stuff, 10-15x faster and you can actually debug it.