Threads have program counters individually according to nvidia, and have done for nearly 10 years ...

20k • today at 3:21 PM • 0 replies • view on HN

Threads have program counters individually according to nvidia, and have done for nearly 10 years

https://docs.nvidia.com/cuda/cuda-programming-guide/03-advan...

> the GPU maintains execution state per thread, including a program counter and call stack, and can yield execution at a per-thread granularity

Divergence isn't good, but sometimes its necessary - not supporting it in a programming model is a mistake. There are some problems you simply can't solve without it, and in some cases you absolutely will get better performance by using divergence

People often tend to avoid divergence by writing an algorithm that does effectively what pascal and earlier GPUs did, which is unconditionally doing all the work on every thread. That will give worse performance than just having a branch, because of the better hardware scheduling these days

alt Hacker News