[Numpy-discussion] Fwd: GPU Numpy
Thu Sep 10 03:41:10 CDT 2009
> The proper way to speed up "dot(a*b+c*sqrt(d), e)" is to get rid of
> temporary intermediates.
I implemented a patch
that reduces the number of temporary intermediates.
In your example from 4 to 2.
There is a big improvement in terms of memory footprint,
and some improvement in terms of speed (especially for
large matrices) but not as much as I expected.
In your example
> result = 0
> for i in range(n):
> result += (a[i]*b[i] + c[i]*sqrt(d[i])) * e[i]
another big speedup could come from the fact that it
makes better use of the cache.
That is exactly why numexpr is faster in these cases.
I hope one day numpy will be able to perform such
More information about the NumPy-Discussion