[Numpy-discussion] Apropos ticked #913
Wed Mar 4 23:37:44 CST 2009
Charles R Harris wrote:
> On Wed, Mar 4, 2009 at 9:09 PM, David Cournapeau
> <firstname.lastname@example.org <mailto:email@example.com>>
> Charles R Harris wrote:
> > On Wed, Mar 4, 2009 at 1:57 PM, Pauli Virtanen <firstname.lastname@example.org
> > <mailto:email@example.com <mailto:firstname.lastname@example.org>>> wrote:
> > Wed, 04 Mar 2009 13:18:55 -0700, Charles R Harris wrote:
> > [clip]
> > > There are python max/min and their behaviour depends on the
> > scalar type.
> > > I haven't looked at the numpy scalars to see precisely
> what they do.
> > >
> > > Numpy max/min are aliases for amax/amin defined when the
> core is
> > > imported. The functions amax/amin in turn map to the array
> > > max/min which call the maximum.reduce/minimum.reduce
> ufuncs, so
> > they all
> > > propagate nans, i.e., if the array contains a nan, nan
> will be the
> > > return value.
> > >
> > > The nonpropagating comparisons are the ufuncs fmax/fmin and
> > there are no
> > > corresponding array methods. I think fmax/fmin should be
> > > fmaximum/fminimum before the release of 1.3 and the names
> > > reserved for the reduced versions to match the names
> > I'll do
> > > that if there are no objections.
> > Aren't the nonpropagating versions of `amax` and `amin` called
> > `nanmax`
> > and `nanmin`? But these are functions, not array methods.
> > What does the `f` in the beginning of `fmax` and `fmin`
> stand for?
> > The functions fmax/fmin are C standard library names, I assume the f
> > stands for floating like the f in fabs. Nanmax and nanmin work by
> > replacing nans with a fill value and then performing the specified
> > operation. For instance, nanmin replaces nans with inf. In contrast,
> > the functions fmax and fmin are real ufuncs and return nan when
> > the inputs are nans, return the non-nan value when only one of the
> > inputs is a nan, and do the normal comparisons when both inputs
> are valid.
> Thanks for the clarification. I agree fmax/fmin is better because
> of the
> C convention.
> Better in what way? I was suggesting renaming them to
> fmaximum/fminimum but am perfectly happy with the current names if you
> feel fmax/fmin are better because of the c connection.
Oups, I read the contrary of what you meant :) My rationale for the name
fmax/fmin is that their behavior is a bit surprising for people not used
to C, so having a different name than C would only add to the confusion.
It is obviously not a strong rationale.
> One thing that still bothers me a bit is the return value of fmax/fmin
> when comparing two complex nan values. A complex number is a nan
> whenever the real or imaginary part is nan, and currently the
> functions return such a number but originally they returned a complex
> number with both parts set to nan. The current implemetation was a
> compromise that kept the code simple while never explicitly using a
> nan value, i.e., the nan came from one of the inputs. I avoided the
> explicit use of a nan value because the NAN macro was possibly
> unreliable at the time. I'm open to thoughts on what the behavior
> should be.
Is it a problem if only one part (real or imaginary) is nan ? We should
have a reliable NAN macro - this should be part of the npymath library,
IMO. I will look into it.
> We should clearly document the difference between those
> function, though.
> You mean the differences with nanmax/nanmin?
max (undefined behavior with nan) vs fmax (same semantics as C
counterpart) vs nanmx (ignore nan). In particular, I think it would be
helpful to document the differences with matlab and R, and suggestions
on how to replace which function from those environments with numpy
equivalent code. I can do this.
> Would you have time to implement something similar for
> sort (sort is important for correct and relatively efficient
> support of
> nanmedian I think) ? If not, that's ok, we'll do without for 1.3
> I would rather take more time for the sort functions.
Sure. My own experience is that this kind of code handling nan is
difficult to make right. We specially need a relatively good set of
tests, because of compilers/platforms specificities.
> I'm also not convinced that would solve the median problem. If 60% of
> the entries were nans would nan be the median? If not we would have to
> find where the nans began or ended and that would most likely need
> searchsorted to be fixed also.
I meant nanmedian, sorry. The current implementation is slow and/or
buggy (I should check the related tickets, though, maybe it was a scipy
More information about the Numpy-discussion