Does anyone use the fancy C++ memory model in their CUDA code? I thought they used the intrinsics and called it a day.