> For the longest time (and for good reasons), floating point operations were considered unsafe f...

jmgao • today at 10:02 AM • 2 replies • view on HN

> For the longest time (and for good reasons), floating point operations were considered unsafe for deterministic purposes. That is still true to some extent, but the picture is more nuanced than that. I have since learned a lot about floating point determinism, and these days I know it is mostly safe if you know how to navigate around the pitfalls.

If you're only concerned about identical binaries on x86, it's not too bad because AMD and Intel tend to have intentionally identical implementations of most floating point operations, with the exception of a few of the approximate reciprocal SSE instructions (rcpps, rsqrtps, etc). Modern x86 instructions tend to have their exact results strictly defined to avoid this kind of inconsistency: https://software.intel.com/en-us/articles/reference-implemen...

If you want this to work across ARM and x86 (or even multiple ARM vendors), you are screwed, and need to restrict yourself to using only the basic arithmetic operations and reimplement everything else yourself.

Replies

dzdt • today at 12:21 PM

At least in the early 2000s, Bloomberg had strict requirements about this. Their financial terminal has a ton of math calculations. The requirement was that they always had live servers running with two different hardware platforms with different operating systems and different CPU architectures and different build chains. The math had to agree to the same bitwise results. They had to turn off almost all compiler optimisations to achieve this, and you had to handle lots of corner cases in code: can't trust NaN or Infinity or underflow to be portable.

They could transparently load balance a user from one different backend platform to the other with zero visible difference to the user.

turtledragonfly • today at 11:43 AM

I'm sure you know, but for others reading: even on the same architecture, there is more to floating point determinism than just running the same "x = a + b" code on each system. There's also the state of the FPU (eg: rounding modes) that can affect results.

On older versions of DirectX (maybe even in some modern Windows APIs?) there were cases where it would internally change the FPU mode, causing chaos for callers trying to use floats deterministically[1].

[1] https://gafferongames.com/post/floating_point_determinism/ (see the Elijah quote, especially)

alt Hacker News

Replies