[Numpy-discussion] Complex nan ordering
Charles R Harris
charlesr.harris@gmail....
Sun Jul 18 19:39:22 CDT 2010
On Sun, Jul 18, 2010 at 5:00 PM, Pauli Virtanen <pav@iki.fi> wrote:
> Sun, 18 Jul 2010 15:57:47 -0600, Charles R Harris wrote:
> > On Sun, Jul 18, 2010 at 3:36 PM, Pauli Virtanen <pav@iki.fi> wrote:
> [clip]
> >> I suggest the following, aping the way the real nan works:
> >>
> >> - (z, nan), (nan, z), (nan, nan), where z is any fp value, are all
> >> equivalent representations of "cnan", as far as comparisons, sort
> >> order, etc are concerned.
> >
> > - The ordering between (z, nan), (nan, z), (nan, nan) is undefined. This
> >> means e.g. that maximum([cnan_1, cnan_2]) can return either cnan_1 or
> >> cnan_2 if both are some cnans.
> >
> > The sort and cmp order was defined in 1.4.0, see the release notes.
> > (z,z), (z, nan), (nan, z), (nan, nan) are in correct order and there are
> > tests to enforce this. Sort and searchsorted need to work together.
>
> Ok, now we're diving into an obscure corner that hopefully many people
> don't care about :)
>
> There are several issues here:
>
> 1) We should not use lexical order in comparison operations,
> since this contradicts real-valued nan arithmetic.
>
>
How so? Nans sort to the end for reals and also to the end for complex. The
sort order for complex isn't strictly a lexical extension of the reals, it's
a bit closer to what you are talking about, *all* complex numbers containing
nans sort higher than "real" complex numbers. The need was to separate the
nan containing numbers from the real numbers. But within each of the "real"
and "nan" regions the numbers are sorted lexically.
> Currently (and in 1.4) we do some weird sort of mixture,
> which seems inconsistent.
>
> 2) maximum/minimum should propagate nans, fmax/fmin should not
>
>
So they do at this time.
> 3) sort/searchsorted, and amax/argmax need to play together
>
>
Then I think amax/amin should conform to the sort order. If we are going to
compare nans, then they should to sit somewhere in a strict order, they
can't both be largest and smallest. The choice of where to put them is
somewhat arbitrary, but they need to go somewhere consistent.
> 4) as long as 1)-3) are valid, I don't think anybody cares what
> what exactly we mean by a "complex nan", as long as
>
> np.isnan("complex nan") == True
>
>
But that has nothing to do with sorting order, its just a broad
classification like positive numbers. In this case it is nan containing
complex numbers.
> The fact that there are happen to be several different representations
> of a complex nan should not be important.
>
>
Why not? Suppose you want to search for certain combinations?
> ***
>
> 1)
>
> Unless we want to define
>
> (complex(nan, 0) > complex(0, 0)) == True
>
>
Looks reasonable to me. That is what the sort order does.
> we cannot strictly follow the lexical order in comparisons. And if we
> define it like this, we contradict real-valued nan arithmetic, which IMHO
> is quite bad.
>
>
As mentioned above, the sorting order for complex isn't strictly lexical.
Whether it is reasonable to extend the sorting order to the usual
comparisons is a different question. I didn't do it for a reason, but maybe
now is the time to "sort" things out.
> Here, it would make sense to me to lump all the different complex nans
> into a single "cnan", as far as the arithmetic comparison operations are
> concerned. Then,
>
> z OP cnan == False
>
> for all comparison operations.
>
> In 1.4.1 we have
>
> >>> import numpy as np
> >>> np.__version__
> '1.4.1'
> >>> x = np.complex64(complex(np.nan, 1))
> >>> y = np.complex64(complex(0, 1))
> >>> x >= y
> False
> >>> x < y
> False
> >>> x = np.complex64(complex(1, np.nan))
> >>> y = np.complex64(complex(0, 1))
> >>> x >= y
> True
> >>> x < y
> False
>
> which seems an obscure mix of real-valued nan arithmetic and lexical
> ordering -- I don't think it's the correct choice...
>
> Of course, the practical importance of this decision approaches zero, but
> it would be nice to be consistent.
>
>
***
>
> 2)
>
> For maximum/amax, strict lexical order contradicts nan propagation:
>
> maximum(1+nan*j, 2+0j) == 2+0j ???
>
>
But that isn't what the sort order yields. An complex number containing nans
in any position will always sort greater than (z,z), it is only in
comparisons between two numbers containing nans that the lexical order comes
back into play.
> I don't see why we should follow the lexical order when both arguments
> are nans. The implementation will be faster if we don't.
>
>
I'm actually a bit curious about the speed.
> Also, this way argmax (which should be nan-propagating) can stop looking
> once it finds the first nan -- and it does not need to care if later on
> in the array there would be a "greater" nan.
>
> ***
>
> 3)
>
> For sort/searchsorted we have a technical reason to do something more,
> and there the strict lexical order seems the correct decision.
>
>
Exactly.
> For `argmax` it was possible to be compatible with `amax` when lumping
> cnans in maximum -- just return the first cnan.
>
>
I don't have a problem distinguishing sort order from normal comparison
order, the notes explicitly label it a sorting order. I think we just need
to be clear if we make a distinction and choose what is best for each.
> ***
>
> 4)
>
> As far as np.isnan is concerned,
>
> >>> np.isnan(complex(0, nan))
> True
> >>> np.isnan(complex(nan, 0))
> True
> >>> np.isnan(complex(nan, nan))
> True
>
> So I think nobody should care which complex nan a function such as
> maximum or amax returns.
>
>
Sure. As long as it is clear that sorting will lead to different results.
> We can of course give up some performance to look for the "greatest" nan
> in these cases, but I do not think that it would be very worthwhile.
>
>
Well, the sort comparison function was optimized on the assumption that nans
are not the common case. At least I think it was, it is rather complex ;)
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100718/03419ff6/attachment.html
More information about the NumPy-Discussion
mailing list