[IPython-user] Time taken to push. Measurements using cPickle, store to /tmp/, unpickle at engines.

mark starnes m.starnes05@imperial.ac...
Mon Nov 10 08:28:37 CST 2008

Hi everyone,

After Brian's comments, the results gained from cPickling the string object,
storing to a file in /tmp/, loading at the engines, unpickling and
tidying up /tmp/, resulted in the following times:


Not the factor 1000 I first thought, but certainly a factor 100. Still, a
good improvement!



mark starnes wrote:
> Hi Brian,
> I see.  I changed from a push, to a, 'save to /tmp/filename, all engines
> read from /tmp/filename', arrangement, and got a factor of 1000 speed up.
> Thanks!
> I would be interested in knowing how to call MPI from within IPython,
> could you point me to a simple example?  Is there an Mpi4Py
> manual?  (I noticed the Mpi4Py site appears to be a regular
> target for vandalism - how annoying).
> The algorithm builds a list of finite element objects (a few 10's of
> thousands, each containing shape functions etc).  The elements
> then update themselves for the required analysis frequency, a system
> stiffness matrix and vector is built, boundary conditions applied, then
> the matrix and vector are solved.  I wanted to build the list on one processor,
> push the list  to all the others, followed by different frequencies
> to each processor.  The solutions would be pulled back on completion,
> and new frequencies sent out, as required.  It's not fine grained
> parallelisation but it used the multi-processor machine I have access to.
> Optimally, I would have liked to have used a more finely grained approach;
> chopping the list into different segments over all the processors, performing
> the frequency update, build, applied the b/c, and solved (keeping the system
> matrix spread around the processors) but haven't got as far as finding a
> matrix solver that would deal with the system matrix being spread around
> this way.  This approach would result in the shared memory of each node
> being used more efficiently, allowing for larger jobs.  If you know of a
> solver that can do this, that would be very good news!
> As it is, I'll be using the save to disk approach and keeping, say,
> four system matrices and performing four solutions on a four processor
> machine.  Not as elegant as the more finely grained approach but
> time's running out for the PhD!
> Thanks for your time.
> Mark.
> Brian Granger wrote:
>> Mark,
>> You are comparing apples (IPython.kernel) and oranges (PyPar+MPI).
>> PyPar uses MPI which is a peer to peer message passing architecture.
>> IPython uses a completely different architecture and doesn't use MPI
>> for its implementation.  Thus, it _should_ be slower than MPI.  I will
>> explain a bit more to clarify things...
>> * The IPythons architecture is not peer to peer.  Instead, all
>> processes connect to a central process (the IPython controller) which
>> manages the computation.  This architecture is required when you want
>> to do things interactively and be able to disconnect/reconnect to a
>> running parallel job.  Here is what things look like in IPython
>> Client
>> |
>> |
>> Controller--------Engine
>> |              \
>> |               \
>> Engine     Engine
>> All of these connections are handled using regular TCP sockets.
>> * We would like to optimize push/pull and other operations in IPython.
>>  At the same time, if you are sending really large objects from the
>> client to the engines, the best solution is for you to re-work your
>> algorithm to avoid this data transfer.  The same is true of using
>> PyPar+MPI.  Here are some tips:
>> 1.  Build the large matrices on the engines in the first place.  The
>> only thing you should send to the engines are i) the info required to
>> build the matrix and ii) the code required to build the matrix.
>> 2.  If the matrices are being read from data on disk, do that on the
>> engines in parallel rather than in the client.
>> 3.  If you need to send data efficiently between engines during a
>> computation, use MPI.  IPython fully integrates with MPI (I can give
>> you more info on how this works if you want).  Also, the best MPI
>> implementation for IPython (by far) is Mpi4Py (mpi4py.scipy.org).
>> * Think of how long it takes to download a 500 MB file.  500 MB is a
>> lot of data no matter how you handle it.  I should be slow.  Granted,
>> IPython currently is not optimized to handle such large objects, but
>> even if it were, it would still be slow.
>> * Can you outline/describe the algorithm you are using?  I am more
>> than willing to help you figure out a way of optimizing it using
>> IPython.
>> Cheers,
>> Brian
>> On Sat, Nov 1, 2008 at 3:05 AM, mark starnes <m.starnes05@imperial.ac.uk> wrote:
>>> Hi everyone,
>>> A short update.  This multi-processor, shared memory machine shows no performance change
>>> with the code snippet above, when I set 'processor affinty' to 1.  I got the tip from,
>>> http://www.open-mpi.org/faq/?category=tuning
>>> and remain interested in the results anyone else gets / tips for improving push times.
>>> Best regards,
>>> Mark.
>>> _______________________________________________
>>> IPython-user mailing list
>>> IPython-user@scipy.org
>>> http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
> _______________________________________________
> IPython-user mailing list
> IPython-user@scipy.org
> http://lists.ipython.scipy.org/mailman/listinfo/ipython-user

More information about the IPython-user mailing list