Yeah, GPUdirect should allow you to dma straight to a storage device.
I wonder... what if the m.2 storage was actually DRAM? You probably don't need persistence for spilling a model off the GPU. How would it fare vs just adding more host memory? The m.2 ram would be less flexible, but would keep the system ram free for the CPU.
Yeah a ramdisk would probably work wonders. It's a shame Intel optane didn't became a standard, those type of workflows would be amazing for it.