logoalt Hacker News

david-gputoday at 7:49 AM1 replyview on HN

> Modern GPUs will go: huh, it sure would be cool if we just shifted the threads about to produce two non divergent warps, and bam divergence solved at the hardware level

Could you kindly share a source for this? Shader Execution Reordering (SER) is available for Ray tracing, but it is not a general-purpose feature that can be used in generic compute shaders.

> Divergent threads can have a better throughput than you'd expect on a modern GPU, as they get more capable at handling this. Divergence isn't bad, its just something you have to manage - and hardware architectures are rapidly improving here

I would strongly advise against this. GPUs are highly efficient when neighboring threads within a warp access neighboring data and follow largely the same code path. Even across warps, data locality is highly desirable.


Replies

20ktoday at 3:17 PM

>I would strongly advise against this. GPUs are highly efficient when neighboring threads within a warp access neighboring data and follow largely the same code path. Even across warps, data locality is highly desirable.

Its a bit like saying writing code at all is bad though. Divergence isn't desirable, but neither is running any code at all - sometimes you need it to solve a problem

Not supporting divergence at all is a huge mistake IMO. It isn't good, but sometimes its necessary

>Could you kindly share a source for this? Shader Execution Reordering (SER) is available for Ray tracing, but it is not a general-purpose feature that can be used in generic compute shaders.

https://docs.nvidia.com/cuda/cuda-programming-guide/03-advan...

My understanding is that this is fully transparent to the programmer, its just more advanced scheduling for threads. SER is something different entirely

Nvidia are a bit vague here, so you have to go digging into patents if you want more information on how it works