[Numpy-discussion] A surprising result from benchmarking

Kevin Jacobs <jacobs@bioinformed.com> bioinformed@gmail....
Sun Mar 11 00:43:56 CST 2007


The inefficiency comes in the generic iteration and construction of int
objects needed by the builtin sum function.  Using the native numarray sum
method on each row is much much faster, summing over the axis directly even
faster still:

t1=time.time()
highEnough=myMat>0.6
greaterPerLine=[x.sum() for x in highEnough]
elapsed1=time.time()-t1
print("method 1a took %f seconds"%elapsed1)

t1=time.time()
highEnough=myMat>0.6
greaterPerLine=highEnough.sum(axis=1)
elapsed1=time.time()-t1
print("method 1b took %f seconds"%elapsed1)

method 1 took 1.503523 seconds
method 2 took 0.163641 seconds
method 1a took 0.006665 seconds
method 1b took 0.004070 seconds

-Kevin


On 3/11/07, Dan Becker <dbecker@alum.dartmouth.org> wrote:
>
> As soon as I posted that I realized it's due to the type conversions from
> True
> to 1.  For some reason, this
>
> ---
> myMat=scipy.randn(500,500)
> t1=time.time()
> highEnough=(myMat>0.6)+0
> greaterPerLine=[sum(x) for x in highEnough]
> elapsed1=time.time()-t1
> print("method 1 took %f seconds"%elapsed1)
> ---
>
> remedies that to some extent.  It is only 20% slower than the map.  Still,
> there
> must be some way for me to make the clean way faster than
>
> greaterPerLine2=map(lambda(x):len(filter(lambda(y):y>0.6,x)),myMat)
>
> I appreciate any advice on how to do that.
>
> Thanks again,
> Dan
>
>
>
>
>
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20070311/a707347c/attachment.html 


More information about the Numpy-discussion mailing list