Optimizing some critical code section and beating the compiler gives you a rush. But last time for me was around 8 years ago. They got really good.
Only higher level stuff like data reordering and latency hiding are left. At least for me. And even some of that can be automated with profile-guided optimization.