[Numpy-discussion] Medians that ignore values
David Cournapeau
david@ar.media.kyoto-u.ac...
Sun Sep 21 01:49:45 CDT 2008
Anne Archibald wrote:
>
> If users are concerned about performance, it's worth noting that on
> some machines nans force a fallback to software floating-point
> handling, with a corresponding very large performance hit. This
> includes some but not all x86 (and I think x86-64) CPUs. How this
> compares to the performance of masked arrays is not clear.
I spent some time on this. In particular, for max.min, I did the
following for the core loop (always return nan if nan is in the array):
/* nan + x and x + nan are nan, where x can be anything:
normal,
* denormal, nan, infinite
*/
tmp = *((@typ@ *)i1) + *((@typ@
*)i2);
if(isnan(tmp))
{
*((@typ@ *)op) =
tmp;
} else
{
*((@typ@ *)op)=*((@typ@ *)i1) @OP@ *((@typ@ *)i2) ? *((@typ@
*)i1) : *((@typ@ *)i2);
}
For large arrays (on my CPU, it is around 10000 items), the function is
3x slower than the original one. I think the main cost is the isnan. 3x
is quite expensive, so I tested a bit isnan on Linux, and it is
surprisingly slow. If I use my own, trivial @define isnan(x) ((x) !=
(x)), it is twice faster than the glibc isnan, and then max/min are as
fast as before, except they are working :)
The isnan thing is surprising, because the whole point to have a isnan
is that you can do it without branching. I checked, and numpy does use
the macro of isnan, not the function (glibc has both).
cheers,
David
More information about the Numpy-discussion
mailing list