[Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit
Robin
robince@gmail....
Tue Nov 3 12:14:26 CST 2009
After some more pootling about I figured out a lot of the performance
loss comes from using 32 bit integers by default when compiles 64 bit.
I asked this question on stackoverflow:
http://stackoverflow.com/questions/1668899/fortran-32-bit-64-bit-performance-portability
is there any way to use fortran with f2py from python in a way that
doesn't require the code to be changed depending on platform?
Or should I just pack it all in and use weave?
Robin
On Tue, Nov 3, 2009 at 4:29 PM, Robin <robince@gmail.com> wrote:
> Hi,
>
> I'm not sure if this is of much interest but it's been really puzzling
> me so I thought I'd ask.
>
> In an earlier post I described how I was surprised a simple f2py
> wrapped fortran bincount was 4x faster than np.bincount - but that
> differential only seemed to be on my mac; on moving to linux they both
> took more or less the same time. I'm trying to work out if it is worth
> moving some of my bottlenecks to fortran (most of which are np
> builtins). So far it looks like it is - but only on my mac and only
> 32bit (see below).
> Well the only explanation I thought was that the gcc-4.0 used to build
> numpy on a mac didn't perform so well, so after upgrading to snow
> leopard I've been trying to look at this again. I was hoping I could
> get the equivalent performance on my mac, like on linux, which would
> result in the np c stuff being a couple of times faster.
>
> So far, with Python 2.6.3 in 64 bit - numpy seems to be significantly
> slower and my fortran code _much_ slower - even from the same
> compiler. Can anyone help me understand what is going on?
>
> I have only been able to build 32 bit numpy against 2.5.4 with apple
> gcc-4.0 and 64 bit numpy against 2.6.3 universal with gcc-4.2. I
> haven't been able to get a numpy I can import on 2.6.3 in 32 bit mode
> ( http://projects.scipy.org/numpy/ticket/1221 ).
>
> Here are the results for python.org 32 bit 2.5.4, numpy compiled with
> apple gcc 4.0, f2py using att gfortran 4.2:
> In [2]: timeit x = np.random.random_integers(0,1023,100000000).astype(int)
> 1 loops, best of 3: 2.86 s per loop
> In [3]: x = np.random.random_integers(0,1023,100000000).astype(int)
> In [4]: timeit np.bincount(x)
> 1 loops, best of 3: 435 ms per loop
> In [6]: timeit gf42.bincount(x,1024)
> 10 loops, best of 3: 129 ms per loop
> In [7]: np.__version__
> Out[7]: '1.4.0.dev7618'
>
> And for self-built (apple gcc 4.2) 64 bit 2.6.3, numpy compiled with
> apple gcc 4.2, f2py using the same att gfortran 4.2:
> In [3]: timeit x = np.random.random_integers(0,1023,100000000).astype(int)
> 1 loops, best of 3: 3.91 s per loop # 37% slower than 32bit
> In [4]: x = np.random.random_integers(0,1023,100000000).astype(int)
> In [5]: timeit np.bincount(x)
> 1 loops, best of 3: 582 ms per loop # 34 % slower than 32 bit
> In [8]: timeit gf42_64.bincount(x,1024)
> 1 loops, best of 3: 803 ms per loop # 522% slower than 32 bit
>
>
> So why is there this big difference in performance? I'd really like to
> know why the fortran compiled with the same compiler is so much slower
> in 64 bit mode. As far as I can tell the flags used are the same. Also
> why is numpy slower. I was surprised the I was able to import the 64
> bit universal module built with f2py from 2.6 inside 32 bit 3.5 and
> there it was quick again - so it seems the x64_64 code generated by
> the fortran compiler is much slower (rather than any wrappers or
> such).
>
> I tried using some more recent gfortrans from macports - but could
> only use them to build modules against the 64 bit python/numpy since I
> couldn't find a way to get f2py to force 32 bit output. But the
> performance was more or less the same (always several times slower the
> 32 bit att gfortran).
>
> Any advice appreciated.
>
> Cheers
>
> Robin
>
> --------
> subroutine bincount (x,c,n,m)
> implicit none
> integer, intent(in) :: n,m
> integer, dimension(0:n-1), intent(in) :: x
> integer, dimension(0:m-1), intent(out) :: c
> integer :: i
>
> c = 0
> do i = 0, n-1
> c(x(i)) = c(x(i)) + 1
> end do
> end
>
More information about the NumPy-Discussion
mailing list