[Numpy-discussion] Complex nan ordering
Pauli Virtanen
pav@iki...
Sun Jul 18 18:00:26 CDT 2010
Sun, 18 Jul 2010 15:57:47 -0600, Charles R Harris wrote:
> On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen <pav@iki.fi> wrote:
[clip]
>> I suggest the following, aping the way the real nan works:
>>
>> - (z, nan), (nan, z), (nan, nan), where z is any fp value, are all
>> equivalent representations of "cnan", as far as comparisons, sort
>> order, etc are concerned.
>
> - The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
>> means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or
>> cnan_2 if both are some cnans.
>
> The sort and cmp order was defined in 1.4.0, see the release notes.
> (z,z), (z, nan), (nan, z), (nan, nan) are in correct order and there are
> tests to enforce this. Sort and searchsorted need to work together.
Ok, now we're diving into an obscure corner that hopefully many people
don't care about :)
There are several issues here:
1) We should not use lexical order in comparison operations,
since this contradicts real-valued nan arithmetic.
Currently (and in 1.4) we do some weird sort of mixture,
which seems inconsistent.
2) maximum/minimum should propagate nans, fmax/fmin should not
3) sort/searchsorted, and amax/argmax need to play together
4) as long as 1)-3) are valid, I don't think anybody cares what
what exactly we mean by a "complex nan", as long as
np.isnan("complex nan") == True
The fact that there are happen to be several different representations
of a complex nan should not be important.
***
1)
Unless we want to define
(complex(nan, 0) > complex(0, 0)) == True
we cannot strictly follow the lexical order in comparisons. And if we
define it like this, we contradict real-valued nan arithmetic, which IMHO
is quite bad.
Here, it would make sense to me to lump all the different complex nans
into a single "cnan", as far as the arithmetic comparison operations are
concerned. Then,
z OP cnan == False
for all comparison operations.
In 1.4.1 we have
>>> import numpy as np
>>> np.__version__
'1.4.1'
>>> x = np.complex64(complex(np.nan, 1))
>>> y = np.complex64(complex(0, 1))
>>> x >= y
False
>>> x < y
False
>>> x = np.complex64(complex(1, np.nan))
>>> y = np.complex64(complex(0, 1))
>>> x >= y
True
>>> x < y
False
which seems an obscure mix of real-valued nan arithmetic and lexical
ordering -- I don't think it's the correct choice...
Of course, the practical importance of this decision approaches zero, but
it would be nice to be consistent.
***
2)
For maximum/amax, strict lexical order contradicts nan propagation:
maximum(1+nan*j, 2+0j) == 2+0j ???
I don't see why we should follow the lexical order when both arguments
are nans. The implementation will be faster if we don't.
Also, this way argmax (which should be nan-propagating) can stop looking
once it finds the first nan -- and it does not need to care if later on
in the array there would be a "greater" nan.
***
3)
For sort/searchsorted we have a technical reason to do something more,
and there the strict lexical order seems the correct decision.
For `argmax` it was possible to be compatible with `amax` when lumping
cnans in maximum -- just return the first cnan.
***
4)
As far as np.isnan is concerned,
>>> np.isnan(complex(0, nan))
True
>>> np.isnan(complex(nan, 0))
True
>>> np.isnan(complex(nan, nan))
True
So I think nobody should care which complex nan a function such as
maximum or amax returns.
We can of course give up some performance to look for the "greatest" nan
in these cases, but I do not think that it would be very worthwhile.
--
Pauli Virtanen
More information about the NumPy-Discussion
mailing list