[Numpy-discussion] Masking an array with another array

josef.pktd@gmai... josef.pktd@gmai...
Wed Apr 22 22:43:28 CDT 2009

On Wed, Apr 22, 2009 at 10:45 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
> On Apr 22, 2009, at 9:03 PM, josef.pktd@gmail.com wrote:
>> I prefer broad casting to list comprehension in numpy:
> Pretty neat! I still dont have the broadcasting reflex. Now, any idea
> which one is more efficient in terms of speed? in terms of temporaries?

I used similar broadcasting for working with categorical data series
and for creating dummy variables for regression. So I played already
for some time with this.

In this case, I would except that the memory consumption is
essentially the same, you have a list of arrays and I have a 2d array,
unless numpy needs an additional conversion to array in
np.logical_or.reduce, which seems plausible but I don't know.

The main point that Sturla convinced me in the discussion on
kendalltau is that if b is large, 500 or 1000, then building the full
intermediate boolean array is killing both memory and speed
performance, compared to a python for loop, and very bad compared to a
cython loop.

In this example my version is at least twice as fast for len(b) = 4,
your version does not scale very well at all to larger b, your takes 7
times as long as mine for len(b) = 400, which, I guess would mean that
you have an extra copying step

I added the for loop and it is always the fastest, even more for short
b. I hope it's correct, I never used a inplace logical operator.


from time import time as time_

a = np.array(range(10)*1000)
blen = 10#100
b = np.array([2,3,5,8]*blen)

print "shape b", b.shape
t = time_()
for _ in range(100):
print time_() - t

t = time_()
for _ in range(100):
    np.logical_or.reduce([a==i for i in b])
print time_() - t

t = time_()
for _ in range(100):
    z = a == b[0]
    for ii in range(1,len(b)):
        z |= (a == b[ii])
print time_() - t

#shape b (80,)

#shape b (80,)

#shape b (400,)

#shape b (400,)

#shape b (400,)
#3.51599979401   #for loop

#shape b (40,)
#0.359999895096   #for loop

#shape b (4,)
#0.0309998989105   #for loop

More information about the Numpy-discussion mailing list