[Numpy-discussion] Masking an array with another array
josef.pktd@gmai...
josef.pktd@gmai...
Wed Apr 22 22:43:28 CDT 2009
On Wed, Apr 22, 2009 at 10:45 PM, Pierre GM <pgmdevlist@gmail.com> wrote:
>
> On Apr 22, 2009, at 9:03 PM, josef.pktd@gmail.com wrote:
>>
>> I prefer broad casting to list comprehension in numpy:
>
> Pretty neat! I still dont have the broadcasting reflex. Now, any idea
> which one is more efficient in terms of speed? in terms of temporaries?
>
I used similar broadcasting for working with categorical data series
and for creating dummy variables for regression. So I played already
for some time with this.
In this case, I would except that the memory consumption is
essentially the same, you have a list of arrays and I have a 2d array,
unless numpy needs an additional conversion to array in
np.logical_or.reduce, which seems plausible but I don't know.
The main point that Sturla convinced me in the discussion on
kendalltau is that if b is large, 500 or 1000, then building the full
intermediate boolean array is killing both memory and speed
performance, compared to a python for loop, and very bad compared to a
cython loop.
In this example my version is at least twice as fast for len(b) = 4,
your version does not scale very well at all to larger b, your takes 7
times as long as mine for len(b) = 400, which, I guess would mean that
you have an extra copying step
I added the for loop and it is always the fastest, even more for short
b. I hope it's correct, I never used a inplace logical operator.
Josef
from time import time as time_
a = np.array(range(10)*1000)
blen = 10#100
b = np.array([2,3,5,8]*blen)
print "shape b", b.shape
t = time_()
for _ in range(100):
(a[:,np.newaxis]==b).any(1)
print time_() - t
t = time_()
for _ in range(100):
np.logical_or.reduce([a==i for i in b])
print time_() - t
t = time_()
for _ in range(100):
z = a == b[0]
for ii in range(1,len(b)):
z |= (a == b[ii])
print time_() - t
#shape b (80,)
#0.110000133514
#0.266000032425
#shape b (80,)
#0.827999830246
#5.2650001049
#shape b (400,)
#4.60899996758
#28.4370000362
#shape b (400,)
#3.89100003242
#27.5
#shape b (400,)
#3.89099979401
#27.3289999962
#3.51599979401 #for loop
#shape b (40,)
#0.453999996185
#2.54600000381
#0.359999895096 #for loop
#shape b (4,)
#0.108999967575
#0.28200006485
#0.0309998989105 #for loop
More information about the Numpy-discussion
mailing list