Uhhh no, it's a huge net loss because the cost of sending it to the GPU and back greatly exceeds the cost of just doing it then and there in CPU; even on iGPU the kernel launch latency etc will kill it, and that's assuming the kernel build is free. Not to mention this is doing pow calls (!!), which is so ridiculous it makes me wonder if this was a kneejerk AI prompt.
Another post in this thread mentioned V8 sped this up by removing a buffer copy; this is adding two buffer copies, each about an order of magnitude slower.
Uhhh no, it's a huge net loss because the cost of sending it to the GPU and back greatly exceeds the cost of just doing it then and there in CPU; even on iGPU the kernel launch latency etc will kill it, and that's assuming the kernel build is free. Not to mention this is doing pow calls (!!), which is so ridiculous it makes me wonder if this was a kneejerk AI prompt.
Another post in this thread mentioned V8 sped this up by removing a buffer copy; this is adding two buffer copies, each about an order of magnitude slower.
Come on guys...