[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays
Thu Nov 8 07:13:19 CST 2007
Am Donnerstag, 08. November 2007 13:44:59 schrieb David Cournapeau:
> Hans Meine wrote:
> > I wonder why simple elementwise operations like "a * 2" or "a + 1" are
> > not performed in order of increasing memory addresses in order to exploit
> > CPU caches etc. - as it is now, their speed drops by a factor of around 3
> > simply by transpose()ing.
> Because it is not trivial to do so in all cases, I guess. It is a
> problem which comes back time to time on the ML, but AFAIK, nobody had a
> fix for it. Fundamentally, for many element-wise operations, either you
> have to implement the thing for every possible case, or you abstract it
> through an iterator, which gives you a decrease of performances in some
OK, so far I have not looked at the NumPy internals, but I would expect sth.
like the following:
[elementwise operation on array 'a']
1) ii = [i for i, s in sorted(enumerate(ia.strides), key = lambda (i, s): -s)]
2) aprime = a.transpose(ii) # aprime will be "as C-contiguous as it gets"
3) bprime = perform_operation(aprime) # fast elementwise operation
4) jj = [ii.index(i) for i in range(bprime.ndim)] # inverse ii
5) b = bprime.transpose(jj) # let result have the same order as the input
Apart from the fact that this is more or less pseudo-code and that the last
step brings a behaviour change (which is intended, but probably not very
welcome), am I overlooking a problem?
> > Similarly (but even less logical), copy() and even the
> > constructor are affected (yes, I understand that copy() creates
> > contiguous arrays, but shouldn't it respect/retain the order
> > nevertheless?):
> I don't see why it is illogical: when you do a copy, you don't preserve
> memory layout, and so a simple memcpy of the whole buffer is not possible.
As I wrote above, I don't think this is good. A fortran-order-contiguous
array is still contiguous, and not inferior in any way to C-order arrays.
So I actually expect copy() to return an array of unchanged order. The numpy
book only says "copy() - Return a copy of the array (which is always
single-segment, and ALIGNED).", which would be fulfilled also if the memory
order was preserved as by my proposed method from above.
Ciao, / /
/ / ANS
More information about the Numpy-discussion