[Numpy-discussion] Performance problems with strided arrays in NumPy
faltet at xot.carabos.com
faltet at xot.carabos.com
Wed Apr 19 14:49:02 CDT 2006
On Tue, Apr 18, 2006 at 09:01:54PM -0600, Travis Oliphant wrote:
> faltet at xot.carabos.com wrote:
> The source of this slowness is the use in numarray of special-cases for
> certain-sized byte-copies.
>
> Apparently, it is *much* faster to do
>
> ((double *)dst)[0] = ((double *)src)[0]
>
> when you have aligned data than it is to do
>
> memmove(dst, src, sizeof(double))
Mmm.. very interesting.
> My timings for your benchmark with current SVN of NumPy are:
>
> NumPy: [0.021701812744140625, 0.021739959716796875, 0.021548032760620117]
> Numarray: [0.052516937255859375, 0.052685976028442383, 0.052355051040649414]
Well, in my machine and using numpy SVN version:
numpy: [0.0974161624908447, 0.0621590614318847, 0.0612149238586425]
numarray: [0.0658359527587890, 0.0623040199279785, 0.0627131462097167]
So, numpy and numarray exhibits same performance now (it's curious why
you are actually getting better performance in your platform). However:
In [25]: stnac=timeit.Timer('b=a.copy()','import numarray as np;
a=np.arange(1000000,dtype="complex128")[::10]')
In [26]: stnpc=timeit.Timer('b=a.copy()','import numpy as np;
a=np.arange(1000000,dtype="complex128")[::10]')
In [27]: stnac.repeat(3,10)
Out[27]: [0.11303496360778809, 0.11540508270263672, 0.11556506156921387]
In [28]: stnpc.repeat(3,10)
Out[28]: [0.21353006362915039, 0.21468400955200195, 0.21390914916992188]
So, it seems that you forgot optimizing complex types. Fortunately,
the cure is easy; after adding the attached patch I'm getting:
In [3]: stnpc.repeat(3,10)
Out[3]: [0.10468602180480957, 0.10204982757568359, 0.10242295265197754]
so, good performance for numpy in copying strided complex128 is
achieved as well.
Thanks for looking into this!
Francesc
======================================================================
--- numpy/core/src/arrayobject.c (revision 2381)
+++ numpy/core/src/arrayobject.c (working copy)
@@ -629,6 +629,14 @@
char *tout = dst;
char *tin = src;
switch(elsize) {
+ case 16:
+ for (i=0; i<N; i++) {
+ ((Float64 *)tout)[0] = ((Float64 *)tin)[0];
+ ((Float64 *)tout)[1] = ((Float64 *)tin)[1];
+ tin = tin + instrides;
+ tout = tout + outstrides;
+ }
+ return;
case 8:
for (i=0; i<N; i++) {
((Float64 *)tout)[0] = ((Float64 *)tin)[0];
More information about the Numpy-discussion
mailing list