[IPython-user] Time taken to push. Adjusting MPI parameter for processor affinity. No effect.
Tue Nov 11 05:11:49 CST 2008
Rather, a speed up when large objects (O E6) are involved. Smaller objects
are faster without the expensive of pickling and storing.....
mark starnes wrote:
> Hi Brian,
> I see. I changed from a push, to a, 'save to /tmp/filename, all engines
> read from /tmp/filename', arrangement, and got a factor of 1000 speed up.
> I would be interested in knowing how to call MPI from within IPython,
> could you point me to a simple example? Is there an Mpi4Py
> manual? (I noticed the Mpi4Py site appears to be a regular
> target for vandalism - how annoying).
> The algorithm builds a list of finite element objects (a few 10's of
> thousands, each containing shape functions etc). The elements
> then update themselves for the required analysis frequency, a system
> stiffness matrix and vector is built, boundary conditions applied, then
> the matrix and vector are solved. I wanted to build the list on one processor,
> push the list to all the others, followed by different frequencies
> to each processor. The solutions would be pulled back on completion,
> and new frequencies sent out, as required. It's not fine grained
> parallelisation but it used the multi-processor machine I have access to.
> Optimally, I would have liked to have used a more finely grained approach;
> chopping the list into different segments over all the processors, performing
> the frequency update, build, applied the b/c, and solved (keeping the system
> matrix spread around the processors) but haven't got as far as finding a
> matrix solver that would deal with the system matrix being spread around
> this way. This approach would result in the shared memory of each node
> being used more efficiently, allowing for larger jobs. If you know of a
> solver that can do this, that would be very good news!
> As it is, I'll be using the save to disk approach and keeping, say,
> four system matrices and performing four solutions on a four processor
> machine. Not as elegant as the more finely grained approach but
> time's running out for the PhD!
> Thanks for your time.
> Brian Granger wrote:
>> You are comparing apples (IPython.kernel) and oranges (PyPar+MPI).
>> PyPar uses MPI which is a peer to peer message passing architecture.
>> IPython uses a completely different architecture and doesn't use MPI
>> for its implementation. Thus, it _should_ be slower than MPI. I will
>> explain a bit more to clarify things...
>> * The IPythons architecture is not peer to peer. Instead, all
>> processes connect to a central process (the IPython controller) which
>> manages the computation. This architecture is required when you want
>> to do things interactively and be able to disconnect/reconnect to a
>> running parallel job. Here is what things look like in IPython
>> | \
>> | \
>> Engine Engine
>> All of these connections are handled using regular TCP sockets.
>> * We would like to optimize push/pull and other operations in IPython.
>> At the same time, if you are sending really large objects from the
>> client to the engines, the best solution is for you to re-work your
>> algorithm to avoid this data transfer. The same is true of using
>> PyPar+MPI. Here are some tips:
>> 1. Build the large matrices on the engines in the first place. The
>> only thing you should send to the engines are i) the info required to
>> build the matrix and ii) the code required to build the matrix.
>> 2. If the matrices are being read from data on disk, do that on the
>> engines in parallel rather than in the client.
>> 3. If you need to send data efficiently between engines during a
>> computation, use MPI. IPython fully integrates with MPI (I can give
>> you more info on how this works if you want). Also, the best MPI
>> implementation for IPython (by far) is Mpi4Py (mpi4py.scipy.org).
>> * Think of how long it takes to download a 500 MB file. 500 MB is a
>> lot of data no matter how you handle it. I should be slow. Granted,
>> IPython currently is not optimized to handle such large objects, but
>> even if it were, it would still be slow.
>> * Can you outline/describe the algorithm you are using? I am more
>> than willing to help you figure out a way of optimizing it using
>> On Sat, Nov 1, 2008 at 3:05 AM, mark starnes <email@example.com> wrote:
>>> Hi everyone,
>>> A short update. This multi-processor, shared memory machine shows no performance change
>>> with the code snippet above, when I set 'processor affinty' to 1. I got the tip from,
>>> and remain interested in the results anyone else gets / tips for improving push times.
>>> Best regards,
>>> IPython-user mailing list
> IPython-user mailing list
More information about the IPython-user