[Numpy-discussion] building NumPy with Intel CC & MKL (solved!)

rex rex at nosyntax.com
Wed Jan 24 18:25:33 CST 2007


Christian Marquardt <christian at marquardt.sc> [2007-01-24 11:09]:
> 
> I'll try to explain... I hope it's not too basic.

Christian, at this point you could explain that shoes are not
interchangeable -- that they are built to be worn on the left foot or the
right foot -- and I'd be grateful for the explanation.

I've left much detail in what follows in the hope that the details may
help someone who is also having trouble using the Intel MKL.

> Python is searching for its modules along the PYTHONPATH, i.e. a list
> of directories where it expects to find whatever it needs. This is the
> same as the Unix shell (or the DOC command.com) is looking in the PATH in
> order to find programs or shell /batch scripts, or the dynamic loader is using
> LD_LIBRARY_PATH for finding shared libraries.
> 
> >> >>>> import numpy
> >> >>>> print numpy
> >> > <module 'numpy' from
> >> '/usr/lib/python2.5/site-packages/numpy/__init__.pyc'>
> >> >
> >> > What am I to make of this? Is it the rpm numpy or is it the numpy I
> >> > built using the Intel compiler and MKL?
> 
> This tells from which directory your Python installation actually loaded
> numpy from: It used the numpy installed in the directory
> 
>    /usr/lib/python2.5/site-packages/numpy
> 
> By *convention* (as someone already pointed out before), the
> /usr/lib/python2.5/site-packages is the directory where the original
> system versions of python packages should be installed. In particular, the
> rpm version will very likely install it's stuff there.

It did.
 
> When installing additional python modules or packages via a command like
> 
>    python setup.py install
> 
> the new packages will also be installed in that system directory. So if
> you have installed your Intel version of numpy with the above command, you
> might have overwritten the rpm stuff. There is a way to install in a
> different place; more on that below.

I'm 95% sure that command put numpy in /usr/local/lib/python25/site-packages
It's possible I used --prefix= <something>, but I don't recall doing so.
 
> You now probably want to find out if the numpy version in /usr/lib/... is
> the Intel one or the original rpm one. To do this, you can check if the
> MKL and Intel libraries are actually loaded by the shared libraries within
> the numpy installation. You can use the command ldd which shows which
> shared libraries are loaded by executables or other shared libraries. For
> example, in my installation, the command
> 
>    ldd <wherever>/python2.5/site-packages/numpy/linalg/lapack_lite.so
> 
> gives the following output:
> 
>    MEDEA /opt/apps/lib/python2.5/site-packages/numpy>ldd
> ./linalg/lapack_lite.so
>         linux-gate.so.1 =>  (0xffffe000)
>         libmkl_lapack32.so => /opt/intel/mkl/8.1/lib/32/libmkl_lapack32.so
> (0x40124000)
>         libmkl_lapack64.so => /opt/intel/mkl/8.1/lib/32/libmkl_lapack64.so
> (0x403c8000)
>         libmkl.so => /opt/intel/mkl/8.1/lib/32/libmkl.so (0x40692000)
>         libvml.so => /opt/intel/mkl/8.1/lib/32/libvml.so (0x406f3000)
>         libguide.so => /opt/intel/mkl/8.1/lib/32/libguide.so (0x4072c000)
>         libpthread.so.0 => /lib/tls/libpthread.so.0 (0x40785000)
>         libimf.so => /opt/intel/fc/9.1/lib/libimf.so (0x40797000)
>         libm.so.6 => /lib/tls/libm.so.6 (0x409d5000)
>         libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x409f8000)
>         libirc.so => /opt/intel/fc/9.1/lib/libirc.so (0x40a00000)
>         libc.so.6 => /lib/tls/libc.so.6 (0x40a41000)
>         libdl.so.2 => /lib/libdl.so.2 (0x40b5b000)
>         /lib/ld-linux.so.2 (0x80000000)
> 
> Note that the MKL libraries are referenced at the beginning - just look at
> the path names! If the output for your lapack_lite.so also contains
> references to the MKL libs, you've got the Intel version in
> /usr/lib/python2.5/.... (and have probably overwritten the rpm version).
> If you do not get any reference to the MKL stuff, it's still the rpm
> version which does not use the MKL.
> 
> Now, let's assume that you have the rpm version in /usr/lib/python2.5/....
> Maybe you'll want to reinstall the rpm to be sure that this is the case.
> 
> You now want to a) install your Intel version in some well-defined place,
> and b) make sure that your Python picks that version up when importing
> numpy.
> 
> To achieve a) one way is to reinstall numpy from the source as before, BUT
> with
> 
>    python setup.py --prefix=<somewhere>
>                    ^^^^^^^^^^^^^^^^^^^^^
> 
> <somewhere> is the path to some directory, e.g.
> 
>    python setup.py install --prefix=$HOME
> 
> The latter would install numpy into the directory
> 
>    $HOME/lib/python2.5/site-packages/numpy
> 
> Do an ls afterwards to check if numpy really arrived there. Instead of
> using the environment variable HOME, you can of course also any other
> directory you like. I'll stick to HOME in the following.
> 
> For b), we have to tell python that modules are waiting for it to be
> picked up in $HOME/lib/python2.5/site-packages. You do that by setting the
> environment variable PYTHONPATH, as was also mentioned in this thread. In
> our example, you would do (for a bash or ksh)
> 
>    export PYTHONPATH=$HOME/lib/python2.5/site-packages
> 
> As long as this variable is set and exported (i.e., visible in the
> environment of every program you start), the next instance of Python
> you'll start will now begin searching for modules in PYTHONPATH whenever
> you do an import, and only fall back to the ones in the system wide
> installation if it doesn't find the required module in PYTHONPATH.
> 
> So, after having set PYTHONPATH in your environment, start up python and
> import numpy. Do the 'print numpy' within python again and look at the
> output. Does it point to the installation directory of your Intel version?
> Great; you're done. If not, this means that something went wrong. It might
> be that you had a typo in the export command or the directory name; it
> might mean that you didn't export the PYTHONPATH before running python; it
> might be that the installation had failed for some reason. You just have
> to play around a bit and see what's going on... but it's not difficult.

It is when one cannot recall what one did yesterday. :( That's an
overstatement, but my recall is becoming unreliable.
 
> Now that you have two versions of numpy, you can (kind of) switch between
> them by making use of the PYTHONPATH. If you unset it ('unset
> PYTHONPATH'), the next python session you are starting in the same
> shell/window will use the original system version. Setting PYTHONPATH
> again and having it point to your local site-packages directory activates
> the stuff you've installed in there. You cannot switch between the two
> numpy versions in the same session); if you want to try the other, you'll
> have to start a new python and make sure that the PYTHONPATH is set up
> appropriately for what you want.
> 
> In the long run, and if you have decided which version to use, you can
> export PYTHONPATH in your $HOME/.profile and don't have to do that
> manually each time (which becomes quite cumbersome after a while, of
> course).
> 
> Common practice is probably that you install your favourite versions or
> builds of python modules in one place (i.e. using $HOME as --prefix), and
> set PYTHONPATH accordingly. It's not a good idea to overwrite the system
> wide installations, but again - that's purely a convention, nothing more.
> 
> Hope this helps a bit... Good luck!

Thank you for taking the time to write such a detailed explanation. If
only the documentation were so detailed...

I looked in /usr/lib/python2.5/site-packages/numpy and it was not
obvious whether the rpm version is there or the version I compiled. So I
did a 'find' from the /:

find . -name "ctypeslib*"

One of the results was:

./usr/local/lib/python2.5/site-packages/numpy/ctypeslib.py

So the  python setup.py command defaulted to /usr/local/... (a Good
Thing, IMHO).

I did:

export PYTHONPATH=/usr/local/lib/python2.5/site-packages
python
Python 2.5 (r25:51908, Nov 27 2006, 19:14:46)
[GCC 4.1.2 20061115 (prerelease) (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.5/site-packages/numpy/__init__.py", line 36, in <module>
    import core
  File "/usr/local/lib/python2.5/site-packages/numpy/core/__init__.py", line 5, in <module>
    import multiarray
ImportError: libsvml.so: cannot open shared object file: No such file or directory

So I checked:

~> ldd /usr/lib/python2.5/site-packages/numpy/linalg/lapack_lite.so
        linux-gate.so.1 =>  (0xffffe000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xb7cc4000)
        libc.so.6 => /lib/libc.so.6 (0xb7b96000)
        /lib/ld-linux.so.2 (0x80000000)

~> ldd /usr/local/lib/python2.5/site-packages/numpy/linalg/lapack_lite.so
        linux-gate.so.1 =>  (0xffffe000)
        libmkl_lapack32.so => /opt/intel/mkl/8.1/lib/32/libmkl_lapack32.so (0xb7bd1000)
        libmkl_lapack64.so => /opt/intel/mkl/8.1/lib/32/libmkl_lapack64.so (0xb7907000)
        libmkl.so => /opt/intel/mkl/8.1/lib/32/libmkl.so (0xb78a6000)
        libvml.so => /opt/intel/mkl/8.1/lib/32/libvml.so (0xb786d000)
        libpthread.so.0 => /lib/libpthread.so.0 (0xb7830000)
        libsvml.so => not found
        libimf.so => not found
        libm.so.6 => /lib/libm.so.6 (0xb780a000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb77fe000)
        libirc.so => not found
        libc.so.6 => /lib/libc.so.6 (0xb76cf000)
        libdl.so.2 => /lib/libdl.so.2 (0xb76cb000)
        libguide.so => /opt/intel/mkl/8.1/lib/32/libguide.so (0xb7696000)
        /lib/ld-linux.so.2 (0x80000000)

At this point my fading brain managed to recall that a 'source' command had to be
issued to use the Intel compiler (icc).

~> source /opt/intel/cc/9.1.042/bin/iccvars.sh
~> python
Python 2.5 (r25:51908, Nov 27 2006, 19:14:46)
[GCC 4.1.2 20061115 (prerelease) (SUSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> print numpy
<module 'numpy' from '/usr/local/lib/python2.5/site-packages/numpy/__init__.pyc'>
>>>

Ah, FINALLY!

So did a speed test with a little Monte Carlo program I wrote:

''' A program that uses Monte Carlo to estimate how often the number of rare events
with a Poisson distribution will differ by a given amount.
'''
import numpy as n
from numpy.random import poisson
from time import time


lam = 4.0  # mu & var for Poisson distributed rands (they are equal in Poisson)
N = 10              #number of times to run the program
maxNumEvents = 20   #events larger than this are ignored
numPois = 100000    #number of pairs of outcomes to generate
freqA = 2           #number of times event A occurred
freqB = 6           #number of times event B occurred

print "#rands fraction [freqA,freqB]  fraction [lam,lam]  largest%  total[mean,mean]"
t0 = time()
for g in range(1):
    for h in range(N):
        suma = n.zeros((maxNumEvents+1,maxNumEvents+1), int)  #possible outcomes array
        count = poisson(lam, size =(numPois,2))  #generate array of pairs of Poissons
        for i in range(numPois):
            #if count[i,0] > maxNumEvents: continue
            #if count[i,1] > maxNumEvents: continue
            suma[count[i,0],count[i,1]] += 1
        d = n.sum(suma)
        print d, float(suma[freqA,freqB])/d, float(suma[lam,lam])/d , suma.max(), suma[lam,lam]
print 'time', time()-t0


Using the SUSE rpm:
python relative_risk.py
#rands fraction [2,6]  fraction [lam,lam]  largest%  total[mean,mean]
100000 0.01539 0.03869 3869 3869
100000 0.01534 0.03766 3907 3766
100000 0.01553 0.03841 3859 3841
100000 0.01496 0.03943 3943 3943
100000 0.01513 0.03829 3856 3829
100000 0.01485 0.03825 3993 3825
100000 0.01545 0.03716 3859 3716
100000 0.01526 0.03909 3919 3909
100000 0.01491 0.03826 3913 3826
100000 0.01478 0.03771 3782 3771
time 2.38847184181

Using the MKL version:
python relative_risk.py
#rands fraction [2,6]  fraction [lam,lam]  largest%  total[mean,mean]
100000 0.01502 0.03764 3895 3764
100000 0.01513 0.03841 3841 3841
100000 0.01511 0.03753 3810 3753
100000 0.01577 0.03766 3873 3766
100000 0.01541 0.0373 3963 3730
100000 0.01586 0.03862 3912 3862
100000 0.01552 0.03785 3870 3785
100000 0.01502 0.03854 3896 3854
100000 0.015 0.03803 3880 3803
100000 0.01515 0.03749 3855 3749
time 2.0455300808

So the rpm version only takes ~17% longer to run this program. I'm surprised
that there isn't a larger difference. Perhaps there will be in a
different type of program. BTW, the cpu is an Intel e6600 Core 2 Duo
overclocked to 3.06 GHz (it will run reliably at 3.24 GHz).

I've added these lines to .bashrc:
source /opt/intel/cc/9.1.042/bin/iccvars.sh
export PYTHONPATH=/usr/local/lib/python2.5/site-packages:/usr/lib/python2.5
export INCLUDE=/opt/intel/mkl/8.1/include:$INCLUDE
export LD_LIBRARY_PATH=/usr/local/lib:/opt/intel/mkl/8.1/lib/32:$LD_LIBRARY_PATH

I don't understand why the 'site-packages' must be included, but without
it, numpy is loaded from /usr/lib/python/site-packages. Why does in look
in the subdirectories in one case, but not in the other? Oh, well it works.


Thanks much for the detailed explanation. It's greatly appreciated. :)

Regards,

-rex 
-- 
I know so little, but i once knew it fluently...


More information about the Numpy-discussion mailing list