[IPython-user] Time taken to push. Adjusting MPI parameter for processor affinity. No effect.

mark starnes m.starnes05@imperial.ac...
Sun Nov 9 03:24:13 CST 2008

Hi Brian,

I see.  I changed from a push, to a, 'save to /tmp/filename, all engines
read from /tmp/filename', arrangement, and got a factor of 1000 speed up.

I would be interested in knowing how to call MPI from within IPython,
could you point me to a simple example?  Is there an Mpi4Py
manual?  (I noticed the Mpi4Py site appears to be a regular
target for vandalism - how annoying).

The algorithm builds a list of finite element objects (a few 10's of
thousands, each containing shape functions etc).  The elements
then update themselves for the required analysis frequency, a system
stiffness matrix and vector is built, boundary conditions applied, then
the matrix and vector are solved.  I wanted to build the list on one processor,
push the list  to all the others, followed by different frequencies
to each processor.  The solutions would be pulled back on completion,
and new frequencies sent out, as required.  It's not fine grained
parallelisation but it used the multi-processor machine I have access to.

Optimally, I would have liked to have used a more finely grained approach;
chopping the list into different segments over all the processors, performing
the frequency update, build, applied the b/c, and solved (keeping the system
matrix spread around the processors) but haven't got as far as finding a
matrix solver that would deal with the system matrix being spread around
this way.  This approach would result in the shared memory of each node
being used more efficiently, allowing for larger jobs.  If you know of a
solver that can do this, that would be very good news!

As it is, I'll be using the save to disk approach and keeping, say,
four system matrices and performing four solutions on a four processor
machine.  Not as elegant as the more finely grained approach but
time's running out for the PhD!

Thanks for your time.


Brian Granger wrote:
> Mark,
> You are comparing apples (IPython.kernel) and oranges (PyPar+MPI).
> PyPar uses MPI which is a peer to peer message passing architecture.
> IPython uses a completely different architecture and doesn't use MPI
> for its implementation.  Thus, it _should_ be slower than MPI.  I will
> explain a bit more to clarify things...
> * The IPythons architecture is not peer to peer.  Instead, all
> processes connect to a central process (the IPython controller) which
> manages the computation.  This architecture is required when you want
> to do things interactively and be able to disconnect/reconnect to a
> running parallel job.  Here is what things look like in IPython
> Client
> |
> |
> Controller--------Engine
> |              \
> |               \
> Engine     Engine
> All of these connections are handled using regular TCP sockets.
> * We would like to optimize push/pull and other operations in IPython.
>  At the same time, if you are sending really large objects from the
> client to the engines, the best solution is for you to re-work your
> algorithm to avoid this data transfer.  The same is true of using
> PyPar+MPI.  Here are some tips:
> 1.  Build the large matrices on the engines in the first place.  The
> only thing you should send to the engines are i) the info required to
> build the matrix and ii) the code required to build the matrix.
> 2.  If the matrices are being read from data on disk, do that on the
> engines in parallel rather than in the client.
> 3.  If you need to send data efficiently between engines during a
> computation, use MPI.  IPython fully integrates with MPI (I can give
> you more info on how this works if you want).  Also, the best MPI
> implementation for IPython (by far) is Mpi4Py (mpi4py.scipy.org).
> * Think of how long it takes to download a 500 MB file.  500 MB is a
> lot of data no matter how you handle it.  I should be slow.  Granted,
> IPython currently is not optimized to handle such large objects, but
> even if it were, it would still be slow.
> * Can you outline/describe the algorithm you are using?  I am more
> than willing to help you figure out a way of optimizing it using
> IPython.
> Cheers,
> Brian
> On Sat, Nov 1, 2008 at 3:05 AM, mark starnes <m.starnes05@imperial.ac.uk> wrote:
>> Hi everyone,
>> A short update.  This multi-processor, shared memory machine shows no performance change
>> with the code snippet above, when I set 'processor affinty' to 1.  I got the tip from,
>> http://www.open-mpi.org/faq/?category=tuning
>> and remain interested in the results anyone else gets / tips for improving push times.
>> Best regards,
>> Mark.
>> _______________________________________________
>> IPython-user mailing list
>> IPython-user@scipy.org
>> http://lists.ipython.scipy.org/mailman/listinfo/ipython-user

More information about the IPython-user mailing list