[Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran
Dag Sverre Seljebotn
Sun Jan 22 11:29:20 CST 2012
On 01/22/2012 04:55 AM, Ondřej Čertík wrote:
> I read the Mandelbrot code using NumPy at this page:
> but when I run it, it gives me integer overflows. As such, I have
> fixed the code, so that it doesn't overflow here:
> and I have also written an equivalent Fortran program.
> You can compare both source codes to see
> that that it is pretty much one-to-one translation.
> The main idea in the above gist is to take an
> algorithm written in NumPy, and translate
> it directly to Fortran, without any special
> optimizations. So the above is my first try
> in Fortran. You can plot the result
> using this simple script (you
> can also just click on this gist to
> see the image there):
> Here are my timings:
> Python Fortran Speedup
> Calculation 12.749 00.784 16.3x
> Saving 01.904 01.456 1.3x
> Total 14.653 02.240 6.5x
> I save the matrices to disk in an ascii format,
> so it's quite slow in both cases. The pure computation
> is however 16x faster in Fortran (in gfortran,
> I didn't even try Intel Fortran, that will probably be
> even faster).
> As such, I wonder how the NumPy version could be sped up?
> I have compiled NumPy with Lapack+Blas from source.
This is a pretty well known weakness with NumPy. In the Python code at
least, each of c and z are about 15 MB, and the mask about 1 MB. So that
doesn't fit in CPU cache, and so each and every statement you do in the
loop transfer that data in and out of CPU cache the memory bus.
There's no quick fix -- you can try to reduce the working set so that it
fits in CPU cache, but then the Python overhead often comes into play.
Solutions include numexpr and Theano -- and as often as not, Cython and
It's a good example, thanks!,
More information about the NumPy-Discussion