[SciPy-user] Python on Intel Xeon Dual Core Machine

Lorenzo Isella lorenzo.isella@gmail....
Wed Feb 6 13:31:12 CST 2008

Unfortunately I am very close to some deadlines and I had to go for the easiest way of adding some RAM memory.
To be honest, I do not see a straightforward way to speed up the code.
Furthermore, my knowledge of Python, operating systems and computers in general is on a different league wrt the one of many people on this list.
So, I'll list some points I may come back to when I post again
(1) profiling with python; I am learning how to do that. I think I am getting somewhere following the online tutorial (http://docs.python.org/lib/profile-instant.html) 
As to your suggestion, I added print time.time() at the end of my code but I am puzzled.
My code starts with these lines

#! /usr/bin/env python

import scipy as s
from scipy import stats #I need this module for the linear fit
import numpy as n
import pylab as p
import rpy as r
#from rpy import r
#import distance_calc as d_calc

and the final statement print time.time() leads to:

Traceback (most recent call last):
  File "<stdin>", line 403, in ?
TypeError: 'numpy.ndarray' object is not callable

where line 403 is the one with the time statement. Should I not get some time statistics instead?

2) Unless something really odd happens, there are 2 bottlenecks in my code:
(a) calculation of a sort of "distance" [not exactly that] between 5000 particles ( an O(5000x5000) operation)
That is done by a Fortran 90 compiled code imported as Python module via f2py
(b)once I have the distances between my particles, the igraph library (http://cran.r-project.org/src/contrib/Descriptions/igraph.html) to find the connected components.
This R library is called via rpy. 

If (a) and (b) cannot be parallelized, then this is hopeless I think.

(3) MKL: is the intel math library at 
what I am supposed to install and tune for my multi-cpu machine?
If so, is it a complicated business?

Many thanks


Date: Tue, 5 Feb 2008 23:09:07 -0500
From: "Anne Archibald" <peridot.faceted@gmail.com>
Subject: Re: [SciPy-user] Python on Intel Xeon Dual Core Machine
To: "SciPy Users List" <scipy-user@scipy.org>
Content-Type: text/plain; charset=UTF-8

On 05/02/2008, Lorenzo Isella <lorenzo.isella@gmail.com> wrote:

> > And thanks everybody for the many replies.
> > I partially solved the problem adding some extra RAM memory.
> > A rather primitive solution, but now my desktop does not use any swap memory and the code runs faster.
> > Unfortunately, the nature of the code does not easily lend itself to being split up into easier tasks.
> > However, apart from the parallel python homepage, what is your recommendation for a beginner who wants a smattering in parallel computing (I have in mind C and Python at the moment)?

Really the first thing to do is figure out what's actually taking the
time in your program.  The python profiler has its limitations, but
it's still worth using. Even just "print time.time()" can make a
difference. If memory is a problem - as it was in your case - and
you're swapping to disk, parallelizing your code may make things run
slower. (Swapping is, as you probably noticed, *incredibly* slow, so
anything that makes you do more of it, like trying to cram more stuff
in memory at once, is going to make things much slower.) Even if
you're already pretty sure you know which parts are slow,
instrumenting it will tell you how much difference the various
parallelization tricks you try are making.

What kind of parallelizing you should do really depends on what's slow
in your program, and on what you can change. At a conceptual idea,
some operations parallelize easily and others require much thought.
For example, if you're doing something ten times, and each time is
independent of the others, that can be easily parallelized (that's
what my little script handythread does). If you're doing something
more complicated - sorting a list, say - that requires complicated
sequencing, parallelizing it is going to be hard.

Start by thinking about the time-consuming tasks you identified above.
Does each task depend on the result of a previous task? If not, you
can run them concurrently, using something like handythread, python's
threading module, or parallel python.

If they do depend on each other, start looking at each time-consuming
task in turn. Could it be parallelized? This can mean one of two
things: you could write code to make the task run in parallel, or you
could make python use something like a parallelized linear-algebra
library that automatically parallelizes (say) matrix multiplication
(this is what the people who suggest MKL are suggesting). More
generally, could the task be made to run faster in other ways? If
you're reading text files, could you read binaries? If you're calling
an external program thousands of times, could you use python or call
it only once with more input?

Parallel programming is a massive, complicated field, and many
high-powered software tools exist to take advantage of it.
Unfortunately, python has a limitation in this area: the Global
Interpreter Lock. Basically it means no two CPUs can be running python
code at the same time. This means that you get no speedup at all by
parallelizing your python code - with a few important exceptions:
while one thread is doing an array operation, other threads can run
python code, and while one thread is waiting for I/O (reading from
disk, for example), other threads can run python code. Parallel python
is a toolkit that can avoid this problem by running multiple python
interpreters (though I have little experience with it).

Generally, parallelization works best when you don't need to move much
data around. The fact that you're running short of memory suggests
that you are doing that. Parallelization also always requires some
restructuring of your code, and more if you want to be more efficient.


More information about the SciPy-user mailing list