[IPython-user] Time taken to push. Adjusting MPI parameter for processor affinity. No effect.
Tue Nov 11 13:26:12 CST 2008
> Rather, a speed up when large objects (O E6) are involved. Smaller objects
> are faster without the expensive of pickling and storing.....
I am not following this email. Was there another part of it.
> mark starnes wrote:
>> Hi Brian,
>> I see. I changed from a push, to a, 'save to /tmp/filename, all engines
>> read from /tmp/filename', arrangement, and got a factor of 1000 speed up.
>> I would be interested in knowing how to call MPI from within IPython,
>> could you point me to a simple example? Is there an Mpi4Py
>> manual? (I noticed the Mpi4Py site appears to be a regular
>> target for vandalism - how annoying).
>> The algorithm builds a list of finite element objects (a few 10's of
>> thousands, each containing shape functions etc). The elements
>> then update themselves for the required analysis frequency, a system
>> stiffness matrix and vector is built, boundary conditions applied, then
>> the matrix and vector are solved. I wanted to build the list on one processor,
>> push the list to all the others, followed by different frequencies
>> to each processor. The solutions would be pulled back on completion,
>> and new frequencies sent out, as required. It's not fine grained
>> parallelisation but it used the multi-processor machine I have access to.
>> Optimally, I would have liked to have used a more finely grained approach;
>> chopping the list into different segments over all the processors, performing
>> the frequency update, build, applied the b/c, and solved (keeping the system
>> matrix spread around the processors) but haven't got as far as finding a
>> matrix solver that would deal with the system matrix being spread around
>> this way. This approach would result in the shared memory of each node
>> being used more efficiently, allowing for larger jobs. If you know of a
>> solver that can do this, that would be very good news!
>> As it is, I'll be using the save to disk approach and keeping, say,
>> four system matrices and performing four solutions on a four processor
>> machine. Not as elegant as the more finely grained approach but
>> time's running out for the PhD!
>> Thanks for your time.
>> Brian Granger wrote:
>>> You are comparing apples (IPython.kernel) and oranges (PyPar+MPI).
>>> PyPar uses MPI which is a peer to peer message passing architecture.
>>> IPython uses a completely different architecture and doesn't use MPI
>>> for its implementation. Thus, it _should_ be slower than MPI. I will
>>> explain a bit more to clarify things...
>>> * The IPythons architecture is not peer to peer. Instead, all
>>> processes connect to a central process (the IPython controller) which
>>> manages the computation. This architecture is required when you want
>>> to do things interactively and be able to disconnect/reconnect to a
>>> running parallel job. Here is what things look like in IPython
>>> | \
>>> | \
>>> Engine Engine
>>> All of these connections are handled using regular TCP sockets.
>>> * We would like to optimize push/pull and other operations in IPython.
>>> At the same time, if you are sending really large objects from the
>>> client to the engines, the best solution is for you to re-work your
>>> algorithm to avoid this data transfer. The same is true of using
>>> PyPar+MPI. Here are some tips:
>>> 1. Build the large matrices on the engines in the first place. The
>>> only thing you should send to the engines are i) the info required to
>>> build the matrix and ii) the code required to build the matrix.
>>> 2. If the matrices are being read from data on disk, do that on the
>>> engines in parallel rather than in the client.
>>> 3. If you need to send data efficiently between engines during a
>>> computation, use MPI. IPython fully integrates with MPI (I can give
>>> you more info on how this works if you want). Also, the best MPI
>>> implementation for IPython (by far) is Mpi4Py (mpi4py.scipy.org).
>>> * Think of how long it takes to download a 500 MB file. 500 MB is a
>>> lot of data no matter how you handle it. I should be slow. Granted,
>>> IPython currently is not optimized to handle such large objects, but
>>> even if it were, it would still be slow.
>>> * Can you outline/describe the algorithm you are using? I am more
>>> than willing to help you figure out a way of optimizing it using
>>> On Sat, Nov 1, 2008 at 3:05 AM, mark starnes <email@example.com> wrote:
>>>> Hi everyone,
>>>> A short update. This multi-processor, shared memory machine shows no performance change
>>>> with the code snippet above, when I set 'processor affinty' to 1. I got the tip from,
>>>> and remain interested in the results anyone else gets / tips for improving push times.
>>>> Best regards,
>>>> IPython-user mailing list
>> IPython-user mailing list
> IPython-user mailing list
More information about the IPython-user