[SciPy-user] Python on Intel Xeon Dual Core Machine

Anne Archibald peridot.faceted@gmail....
Wed Feb 6 14:09:13 CST 2008


On 06/02/2008, Lorenzo Isella <lorenzo.isella@gmail.com> wrote:

> Unfortunately I am very close to some deadlines and I had to go for the easiest way of adding some RAM memory.

Don't knock it, it worked. All too often we waste countless hours
trying to tune the performance of our code when we should spend a few
dollars on hardware to make the code we've got run faster. Less
effort, less bugs, results sooner.

> So, I'll list some points I may come back to when I post again
> (1) profiling with python; I am learning how to do that. I think I am getting somewhere following the online tutorial (http://docs.python.org/lib/profile-instant.html)
> As to your suggestion, I added print time.time() at the end of my code but I am puzzled.

Ah. Well, just adding that line won't help. You have to import the
time module; then calling time.time() gives you a floating-point
number telling you what time it is right now. So a quick and dirty
alternative to sophisticated profiling looks like:

t = time.time()
# do something potentially time-consuming
print "Operation took %g seconds" % (time.time()-t)

The weird error you gets sounds like you have something else you're
calling time somewhere in your code. You can get around that by doing

from time import time as what_time_is_it_now()

or whatever name you like that doesn't conflict with a variable name
in your code.

> 2) Unless something really odd happens, there are 2 bottlenecks in my code:
> (a) calculation of a sort of "distance" [not exactly that] between 5000 particles ( an O(5000x5000) operation)
> That is done by a Fortran 90 compiled code imported as Python module via f2py

It should be possible to accelerate this, depending on how it's
calculated. If you're just calculating it in a  brute-force way
(supplying each pair to a function), then this can definitely be
parallelized; for example something like

distances = handythread.parallel_map(mydistance, ((M[i],M[i+1:]) for i
in xrange(n-1)))

where M is your list of points, and mydistance takes a single point
and an array of points and returns an array of distances between the
first point and the rest. You'll get back a "triangular" list of
arrays containing all the points, and it'll get run on two (or however
many you ask for) processors. It may require you to modify the calling
interface of your F90 code.

If the result is sparse, that is, you almost all zeros (or
infinities), you should think about also making the Fortran code
return a sparse representation. Reducing memory use can drastically
accelerate code on modern processors (which are much much faster than
RAM can keep up with).

> (b)once I have the distances between my particles, the igraph library (http://cran.r-project.org/src/contrib/Descriptions/igraph.html) to find the connected components.
> This R library is called via rpy.

It's quite possible that rpy is slow. I don't know anything about it,
never used either it or R; I would look for code implemented in python
or C or Fortran. In fact, it looks like igraph has a python binding.
I'd try this, in case going through rpy is slowing you down.

Parallelizing igraph would involve rewriting the important algorithms
in a parallel fashion. This would be a challenge comparable to writing
igraph in the first place.

> If (a) and (b) cannot be parallelized, then this is hopeless I think.

If the slow step is producing the distances - and it sounds like it
might be -  you will probably get a speedup by close to a factor of
two (or however many processors you have) by rearranging your code so
that pairwise distances can be computed in parallel.

> (3) MKL: is the intel math library at
> http://www.intel.com/support/performancetools/libraries/mkl/linux/
> what I am supposed to install and tune for my multi-cpu machine?
> If so, is it a complicated business?

That would be it. I've never done it, but I imagine Intel has gone to
some lengths to make it convenient. This will only help with
operations like matrix multiplication and inversion, none of which, by
the sound of it, are performance-critical. Find out what's slow before
going to the trouble.

Good luck,
Anne


More information about the SciPy-user mailing list