[IPython-user] Time taken to push. Adjusting MPI parameter for processor affinity. No effect.

Brian Granger ellisonbg.net@gmail....
Tue Nov 11 13:26:12 CST 2008


> Rather, a speed up when large objects (O E6) are involved.  Smaller objects
> are faster without the expensive of pickling and storing.....

I am not following this email.  Was there another part of it.

Brian

> BR,
>
> m.
>
> mark starnes wrote:
>> Hi Brian,
>>
>> I see.  I changed from a push, to a, 'save to /tmp/filename, all engines
>> read from /tmp/filename', arrangement, and got a factor of 1000 speed up.
>> Thanks!
>>
>> I would be interested in knowing how to call MPI from within IPython,
>> could you point me to a simple example?  Is there an Mpi4Py
>> manual?  (I noticed the Mpi4Py site appears to be a regular
>> target for vandalism - how annoying).
>>
>> The algorithm builds a list of finite element objects (a few 10's of
>> thousands, each containing shape functions etc).  The elements
>> then update themselves for the required analysis frequency, a system
>> stiffness matrix and vector is built, boundary conditions applied, then
>> the matrix and vector are solved.  I wanted to build the list on one processor,
>> push the list  to all the others, followed by different frequencies
>> to each processor.  The solutions would be pulled back on completion,
>> and new frequencies sent out, as required.  It's not fine grained
>> parallelisation but it used the multi-processor machine I have access to.
>>
>> Optimally, I would have liked to have used a more finely grained approach;
>> chopping the list into different segments over all the processors, performing
>> the frequency update, build, applied the b/c, and solved (keeping the system
>> matrix spread around the processors) but haven't got as far as finding a
>> matrix solver that would deal with the system matrix being spread around
>> this way.  This approach would result in the shared memory of each node
>> being used more efficiently, allowing for larger jobs.  If you know of a
>> solver that can do this, that would be very good news!
>>
>> As it is, I'll be using the save to disk approach and keeping, say,
>> four system matrices and performing four solutions on a four processor
>> machine.  Not as elegant as the more finely grained approach but
>> time's running out for the PhD!
>>
>> Thanks for your time.
>>
>> Mark.
>>
>>
>>
>> Brian Granger wrote:
>>> Mark,
>>>
>>> You are comparing apples (IPython.kernel) and oranges (PyPar+MPI).
>>> PyPar uses MPI which is a peer to peer message passing architecture.
>>> IPython uses a completely different architecture and doesn't use MPI
>>> for its implementation.  Thus, it _should_ be slower than MPI.  I will
>>> explain a bit more to clarify things...
>>>
>>> * The IPythons architecture is not peer to peer.  Instead, all
>>> processes connect to a central process (the IPython controller) which
>>> manages the computation.  This architecture is required when you want
>>> to do things interactively and be able to disconnect/reconnect to a
>>> running parallel job.  Here is what things look like in IPython
>>>
>>> Client
>>> |
>>> |
>>> Controller--------Engine
>>> |              \
>>> |               \
>>> Engine     Engine
>>>
>>> All of these connections are handled using regular TCP sockets.
>>>
>>> * We would like to optimize push/pull and other operations in IPython.
>>>  At the same time, if you are sending really large objects from the
>>> client to the engines, the best solution is for you to re-work your
>>> algorithm to avoid this data transfer.  The same is true of using
>>> PyPar+MPI.  Here are some tips:
>>>
>>> 1.  Build the large matrices on the engines in the first place.  The
>>> only thing you should send to the engines are i) the info required to
>>> build the matrix and ii) the code required to build the matrix.
>>>
>>> 2.  If the matrices are being read from data on disk, do that on the
>>> engines in parallel rather than in the client.
>>>
>>> 3.  If you need to send data efficiently between engines during a
>>> computation, use MPI.  IPython fully integrates with MPI (I can give
>>> you more info on how this works if you want).  Also, the best MPI
>>> implementation for IPython (by far) is Mpi4Py (mpi4py.scipy.org).
>>>
>>> * Think of how long it takes to download a 500 MB file.  500 MB is a
>>> lot of data no matter how you handle it.  I should be slow.  Granted,
>>> IPython currently is not optimized to handle such large objects, but
>>> even if it were, it would still be slow.
>>>
>>> * Can you outline/describe the algorithm you are using?  I am more
>>> than willing to help you figure out a way of optimizing it using
>>> IPython.
>>>
>>> Cheers,
>>>
>>> Brian
>>>
>>> On Sat, Nov 1, 2008 at 3:05 AM, mark starnes <m.starnes05@imperial.ac.uk> wrote:
>>>> Hi everyone,
>>>>
>>>> A short update.  This multi-processor, shared memory machine shows no performance change
>>>> with the code snippet above, when I set 'processor affinty' to 1.  I got the tip from,
>>>>
>>>> http://www.open-mpi.org/faq/?category=tuning
>>>>
>>>>
>>>> and remain interested in the results anyone else gets / tips for improving push times.
>>>>
>>>> Best regards,
>>>>
>>>> Mark.
>>>> _______________________________________________
>>>> IPython-user mailing list
>>>> IPython-user@scipy.org
>>>> http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
>>>>
>> _______________________________________________
>> IPython-user mailing list
>> IPython-user@scipy.org
>> http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
>>
> _______________________________________________
> IPython-user mailing list
> IPython-user@scipy.org
> http://lists.ipython.scipy.org/mailman/listinfo/ipython-user
>


More information about the IPython-user mailing list