[Numpy-discussion] Unnecessarily bad performance of elementwise operators with Fortran-arrays
Fri Nov 9 21:09:07 CST 2007
On Nov 10, 2007 11:23 AM, David Cournapeau <firstname.lastname@example.org> wrote:
> This would need some benchmarks, but I have always read that using
> pointer arithmetics should be avoided when speed matters (e.g. *a + n
> * sizeof(*a) compared to a[n]), because it becomes much more difficult
> for the compiler to optimize, Generally, if you can get to a function
> which does the thing the "obvious way", this is better. Of course, you
> have to separate the case where this is possible and where it is not.
> But such work would also be really helpful if/when we optimize some
> basic things with MMX/SSE and co, and I think the above is impossible
> to auto vectorize (gcc 4.3, not released yet, gives some really
> helpful analysis for that, and can tell you when it fails to
> auto-vectorize, and why).
Actually, gcc 4.2 already supports the option: -ftree-vectorizer-verbose=n.
> > > While this is
> > > a good idea (also probably quite some work), the real thing bugging me is
> > > that the above DOUBLE_add could (and should!) be called by the ufunc
> > > framework in such a way that it is equally efficient for C and Fortran
> > > arrays.
> > Yes, that's what I was talking about. There is actually a path through
> > the ufunc code where this loop is called only once. The requirement
> > right now is that all the arrays are C-contiguous, but this should be
> > changed to all arrays have the same contiguousness (and the output-array
> > creation code changed to create Fortran-order arrays when the inputs are
> > all Fortran-order).
> This was my other point. For some of my own C code, that's what I do:
> I have some function to detect whether the array can be treated
> sequentially, for cases where C vs F does not matter at all. I don't
> see difficulty to provide such a facility in numpy, at least for the
> common operations we are talking about, but maybe I am missing
> something. I should take a look a the ufunc machinery
More information about the Numpy-discussion