[Numpy-discussion] Performance of the array protocol

Francesc Altet faltet at carabos.com
Tue Nov 1 09:20:16 CST 2005

El dt 01 de 11 del 2005 a les 09:11 -0700, en/na Travis Oliphant va
> If you are going to be copying the data anyway, then there may be no 
> advantage to the array protocol (in fact because it has to look up 
> several attributes of the input object it can be slower).  When you use 
> Numeric.array(na) it makes a copy of the data by default.
> The idea is to be able to use the array protocol to not have to make 
> copies of the data.

Yes, I don't want to do a copy. And, in fact, I want to use moderately
large array conversions (10**4 ~ 10**6 elements).

> Try using  num = Numeric.array(na,copy=0)  in your first timing runs and 
> see what that provides.

Good! Using copy=0 and larger arrays (10**5 elements) I'm getting now:

>>> t1_2=timeit.Timer("num=Numeric.array(na,copy=0)", "import numarray; import Numeric; na=numarray.arange(100000)")
>>> t1_2.repeat(3,1000)
[0.064317941665649414, 0.060917854309082031, 0.07666015625]
>>> t2_2=timeit.Timer("num=Numeric.fromstring(na._data,typecode=na.typecode())", "import numarray; import Numeric; na=numarray.arange(100000)")
>>> t2_2.repeat(3,1000)
[4.8582658767700195, 4.8404099941253662, 4.8652839660644531]

So, the implementation of the array protocol in the numarray --> Numeric way is
performing ashtonishingly well :-)

For the records, using the array protocol without a copy gives:

>>> t1=timeit.Timer("num=Numeric.array(na)", "import numarray; import Numeric; na=numarray.arange(100000)")
>>> t1.repeat(3,1000)
[5.014805793762207, 4.9959368705749512, 5.0420081615447998]

i.e. almost as fast a the fromstring() method, which is very good as

BTW, I'm wondering whether a False value for copy should be used as the
default instead of True. IMO, many people would want to make use of the
array protocol just to access easily the data, and making a copy()
behind the scenes just for this might be potentially killer, specially
for large objects.

>> Conversely, for Numeric --> numarray:
> Again, you are making copies of the data.  I'm not sure how numarray 
> handles the array protocol on consumption of the interface, so I can't 
> comment further.

Mmmm, I've tried disabling the copy, but unfortunately enough I can't
get the same figures as above:

>>> t3=timeit.Timer("na=numarray.array(num,copy=0)", "import numarray;
import Numeric; num=Numeric.arange(100000)")
>>> t3.repeat(3,10)
[1.6356601715087891, 1.6529910564422607, 1.6299269199371338]
>>>t4=timeit.Timer("na=numarray.array(buffer(num),type=num.typecode(),shape=num.shape)", "import numarray; import Numeric; num=Numeric.arange(100000)")
>>> t4.repeat(3,1000)
[0.045578956604003906, 0.043890953063964844, 0.043296098709106445]

so, for the Numeric --> numarray way, the slowdown is more than three
orders of magnitude than expected (note the fewer iterations for the
first repeat loop). Maybe Todd can comment more on this.


>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data

More information about the Numpy-discussion mailing list