logoalt Hacker News

wmfyesterday at 9:21 PM1 replyview on HN

Yes, they should score well on Linpack as long as they use Ozaki emulation.


Replies

adrian_btoday at 5:02 AM

No, that is too slow.

Most claims about the cost of emulating FP64 on GPUs are wrong, because they assume that only the significand of floating-point numbers must be extended.

In reality it is even more important to extend the exponent, because with the exponent of FP32 overflows would be much too frequent in scientific/technical computations to accomplish anything.

The minimum FP64 emulation on FP32-capable GPUs requires 3 numbers per emulated FP64, which may be 3 FP32 numbers, or the exponent may be an Int32, if that works better on the target GPU. An emulated FP64 operation is likely to be at least 20 times slower than a FP32 operation.

That is much faster than the 1:64 ratio provided in hardware by an NVIDIA GPU, but even on the fastest FP32 GPUs it is too slow to compete with CPUs, in a professional setting.

FP64 emulation on a GPU can be useful only in a home computer, which may have a rather weak CPU and increasing the FP64 throughput using the GPU can be done at no additional cost, so it can be worthwhile.