> For example, std::time::Instant is implemented on the GPU using a device timer
The code is running on the gpu there. It looks like remote calls are only for "IO", the compiled stdlib is generally running on gpu. (Going just from the post, haven't looked at any details)
I'm surprised this article doesn't provide a bigger list of calls that run on the gpu and further examples of what needs some cpu interop.
Which is a generally valid implementation of IO. For instance on the Nintendo Wii, the support processor ran its own little microkernel OS and exposed an IO API that looked like a remote filesystem (including plan 9 esque network sockets as filesystem devices).