# [Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays

Hans Meine meine@informatik.uni-hamburg...
Thu Nov 8 07:13:19 CST 2007

```Am Donnerstag, 08. November 2007 13:44:59 schrieb David Cournapeau:
> Hans Meine wrote:
> > I wonder why simple elementwise operations like "a * 2" or "a + 1" are
> > not performed in order of increasing memory addresses in order to exploit
> > CPU caches etc. - as it is now, their speed drops by a factor of around 3
> > simply by transpose()ing.
>
> Because it is not trivial to do so in all cases, I guess. It is a
> problem which comes back time to time on the ML, but AFAIK, nobody had a
> fix for it. Fundamentally, for many element-wise operations, either you
> have to implement the thing for every possible case, or you abstract it
> through an iterator, which gives you a decrease of performances in some
> cases.

OK, so far I have not looked at the NumPy internals, but I would expect sth.
like the following:

[elementwise operation on array 'a']
1) ii = [i for i, s in sorted(enumerate(ia.strides), key = lambda (i, s): -s)]
2) aprime = a.transpose(ii) # aprime will be "as C-contiguous as it gets"
3) bprime = perform_operation(aprime) # fast elementwise operation
4) jj = [ii.index(i) for i in range(bprime.ndim)] # inverse ii
5) b = bprime.transpose(jj) # let result have the same order as the input

Apart from the fact that this is more or less pseudo-code and that the last
step brings a behaviour change (which is intended, but probably not very
welcome), am I overlooking a problem?

> >  Similarly (but even less logical), copy() and even the
> > constructor are affected (yes, I understand that copy() creates
> > contiguous arrays, but shouldn't it respect/retain the order
> > nevertheless?):
>
> I don't see why it is illogical: when you do a copy, you don't preserve
> memory layout, and so a simple memcpy of the whole buffer is not possible.

As I wrote above, I don't think this is good.  A fortran-order-contiguous
array is still contiguous, and not inferior in any way to C-order arrays.
So I actually expect copy() to return an array of unchanged order.  The numpy
book only says "copy() - Return a copy of the array (which is always
single-segment, and ALIGNED).", which would be fulfilled also if the memory
order was preserved as by my proposed method from above.

--
Ciao, /  /
/--/
/  / ANS
```