[Numpy-discussion] Masking an array with another array

josef.pktd@gmai... josef.pktd@gmai...
Wed Apr 22 22:43:28 CDT 2009


On Wed, Apr 22, 2009 at 10:45 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
>
> On Apr 22, 2009, at 9:03 PM, josef.pktd@gmail.com wrote:
>>
>> I prefer broad casting to list comprehension in numpy:
>
> Pretty neat! I still dont have the broadcasting reflex. Now, any idea
> which one is more efficient in terms of speed? in terms of temporaries?
>

I used similar broadcasting for working with categorical data series
and for creating dummy variables for regression. So I played already
for some time with this.

In this case, I would except that the memory consumption is
essentially the same, you have a list of arrays and I have a 2d array,
unless numpy needs an additional conversion to array in
np.logical_or.reduce, which seems plausible but I don't know.

The main point that Sturla convinced me in the discussion on
kendalltau is that if b is large, 500 or 1000, then building the full
intermediate boolean array is killing both memory and speed
performance, compared to a python for loop, and very bad compared to a
cython loop.

In this example my version is at least twice as fast for len(b) = 4,
your version does not scale very well at all to larger b, your takes 7
times as long as mine for len(b) = 400, which, I guess would mean that
you have an extra copying step

I added the for loop and it is always the fastest, even more for short
b. I hope it's correct, I never used a inplace logical operator.

Josef

from time import time as time_

a = np.array(range(10)*1000)
blen = 10#100
b = np.array([2,3,5,8]*blen)


print "shape b", b.shape
t = time_()
for _ in range(100):
    (a[:,np.newaxis]==b).any(1)
print time_() - t

t = time_()
for _ in range(100):
    np.logical_or.reduce([a==i for i in b])
print time_() - t


t = time_()
for _ in range(100):
    z = a == b[0]
    for ii in range(1,len(b)):
        z |= (a == b[ii])
print time_() - t

#shape b (80,)
#0.110000133514
#0.266000032425

#shape b (80,)
#0.827999830246
#5.2650001049

#shape b (400,)
#4.60899996758
#28.4370000362

#shape b (400,)
#3.89100003242
#27.5

#shape b (400,)
#3.89099979401
#27.3289999962
#3.51599979401   #for loop

#shape b (40,)
#0.453999996185
#2.54600000381
#0.359999895096   #for loop

#shape b (4,)
#0.108999967575
#0.28200006485
#0.0309998989105   #for loop


More information about the Numpy-discussion mailing list