A reimplementation of MaskedArray
Eric Firing
efiring at hawaii.edu
Thu Nov 9 13:27:22 CST 2006
It looks like on my Pentium M multiplication with NaNs is slow, but
using a masked array ranges from slightly faster (with only one value
masked) to twice as slow (with all values masked):
In [15]:Timer("a.prod()", "import numpy as np; aa = np.ones(4096); a =
np.ma.masked_greater(aa,0)").timeit(10000)
Out[15]:9.4012830257415771
In [16]:Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0]=
np.nan").timeit(10000)
Out[16]:5.5737850666046143
In [17]:Timer("a.prod()", "import numpy as np; a =
np.ones(4096)").timeit(10000)
Out[17]:0.40796804428100586
In [18]:Timer("a.prod()", "import numpy as np; aa = np.ones(4096); aa[0]
= 2; a = np.ma.masked_greater(aa,1)").timeit(10000)
Out[18]:4.1544749736785889
In [19]:Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[:]=
np.nan").timeit(10000)Out[19]:5.8589630126953125
For transcendentals, nans or masks don't make much difference, although
masks are slightly faster than nans:
In [20]:Timer("np.sin(a)", "import numpy as np; a = np.ones(4096); a[:]=
np.nan").timeit(10000)
Out[20]:4.5575671195983887
In [21]:Timer("np.ma.sin(a)", "import numpy as np; aa = np.ones(4096); a
= np.ma.masked_greater(aa,0)").timeit(10000)
Out[21]:4.4125270843505859
In [22]:Timer("b=np.sin(a)", "import numpy as np; a =
np.ones(4096)").timeit(10000)
Out[22]:3.5793929100036621
Eric
Tim Hochberg wrote:
> A. M. Archibald wrote:
>> On 08/11/06, Tim Hochberg <tim.hochberg at ieee.org> wrote:
>>
>>
>>> It has always been my experience (on various flavors or Pentium) that
>>> operating on NANs is extremely slow. Does anyone know on what hardware
>>> NANs are *not* slow? Of course it's always possible I just never notice
>>> NANs on hardware where they aren't slow.
>>>
>> On an opteron machine I have access to, they appear to be no slower
>> (and even faster for some transcendental functions) than ordinary
>> floats:
>>
>> In [13]: a=zeros(1000000)
>>
>> In [14]: %time for i in xrange(1000): a += 1.1
>> CPU times: user 6.87 s, sys: 0.00 s, total: 6.87 s
>> Wall time: 6.87
>>
>> In [15]: a *= NaN
>>
>> In [16]: %time for i in xrange(1000): a += 1.1
>> CPU times: user 6.86 s, sys: 0.00 s, total: 6.86 s
>> Wall time: 6.87
>>
>> On my Pentium M, they are indeed significantly slower (three times? I
>> didn't really do enough testing to say how much). I am actually rather
>> offended by this unfair discrimination against a citizen in good
>> standing of the IEEE floating point community.
>>
> If they're only 3x slower you're doing better than I am. On my core duo
> box they appear to be nearly 20x slower for both addition and
> multiplication. This more or less matches what I recall from earlier boxes.
>
> >>> Timer("a.prod()", "import numpy as np; a = np.ones(4096); a[0]
> = np.nan").timeit(10000)
> 5.6495738983585397
> >>> Timer("a.prod()", "import numpy as np; a =
> np.ones(4096)").timeit(10000)
> 0.34439041833525152
> >>> Timer("a.sum()", "import numpy as np; a = np.ones(4096); a[0] =
> np.nan").timeit(10000)
> 6.0985655998179027
> >>> Timer("a.sum()", "import numpy as np; a =
> np.ones(4096)").timeit(10000)
> 0.32354363473564263
>
> I've been told that operations on NANs are slow because they aren't
> always implemented in the FPU hardware. Instead they are trapped and
> implemented software or firmware or something or other. That may well be
> bogus 42nd hand information though.
>
> -tim
>
>
