logoalt Hacker News

AndriyKunitsyntoday at 8:30 PM3 repliesview on HN

NaN that is not equal to itself _even if it's the same variable_ is not a Python oddity, it's an IEEE 754 oddity.


Replies

riskassessmenttoday at 8:52 PM

Nor is that inequality an oddity at all. If you were to think NaN should equal NaN, that thought would probably stem from the belief that NaN is a singular entity which is a misunderstanding of its purpose. NaN rather signifies a specific number that is not representable as a floating point. Two specific numbers that cannot be represented are not necessarily equal because they may have resulted from different calculations!

I'll add that, if I recall correctly, in R, the statement NaN == NaN evaluates to NA which basicall means "it is not known whether these numbers equal each other" which is a more reasonable result than False.

show 2 replies
paulddrapertoday at 9:41 PM

It's an IEEE-754 oddity that Python chose to adopt for its equality.

IEEE-754 does remainder(5, 3) = -1, whereas Python does 5 % 3 = 2.

There's no reason to expect exact equivalence between operators.

adrian_btoday at 9:15 PM

It is not an IEEE 754 oddity. It is the correct mathematical behavior.

When you define an order relation on a set, the order may be either a total order or a partial order.

In a totally ordered set, there are 3 values for a comparison operation: equal, less and greater. In a partially ordered set, there are 4 values for a comparison operation: equal, less, greater and unordered.

For a totally ordered set you can define 6 relational operators (6 = 2^3 - 2, where you subtract 2 for the always false and always true predicates), while for a partially ordered set you can define 14 relational operators (14 = 2^4 - 2).

For some weird reason, many programmers have not been taught properly about partially-ordered sets and also most programming languages do not define the 14 relational operators needed for partially ordered sets, but only the 6 relational operators that are sufficient for a totally ordered set.

It is easy to write all 14 relational operators by combinations of the symbols for not, less, greater and equal, so parsing this in a programming language would be easy.

This lack of awareness about partial order relations and the lack of support in most programming languages is very bad, because practical applications need very frequently partial orders instead of total orders.

For the floating-point numbers, the IEEE standard specifies 2 choices. You can either use them as a totally-ordered set, or as a partially-ordered set.

When you encounter NaNs as a programmer, that is because you have made the choice to have partially-ordered FP numbers, so you are not allowed to complain that this is an odd behavior, when you have chosen it. Most programmers do not make this choice consciously, because they just use the default configuration of the standard library, but it is still their fault if the default does not do what they like, but nonetheless they have not changed the default settings.

If you do not want NaNs, you must not mask the invalid operation exception. This is actually what the IEEE standard recommends as the default behavior, but lazy programmers do not want to handle exceptions, so most libraries choose to mask all exceptions in their default configurations.

When invalid operations generate exceptions, there are no NaNs and the FP numbers are totally ordered, so the 6 relational operators behave as naive programmers expect them to behave.

If you do not want to handle the invalid operation exception and you mask it, there is no other option for the CPU than to use a special value that reports an invalid operation, and which is indeed not-a-number. With not-numbers added to the set of FP numbers, the set becomes a partially-ordered set and all relational operators must be interpreted accordingly.

If you use something like C/C++, with only 6 relational operators, then you must do before any comparison tests to detect any NaN operand, because otherwise the relational operators do not do what you expect them to do.

In a language with 14 relational operators, you do not need to check for NaNs, but you must choose carefully the relational operator, because for a partially-ordered set, for example not-less is not the same with greater-or-equal (because not-less is the same with greater-or-equal-or-unordered).

If you do not expect to do invalid operations frequently, it may be simpler to unmask the exception, so that you will never have to do any test for NaN detection.

show 1 reply