[IPython-User] IPython.parallel over Infiniband?

Jon Olav Vik jonovik@gmail....
Fri Sep 7 06:21:54 CDT 2012

Short version: Can the IPython.parallel ipcontroller and ipengines use 
Infiniband for communication?


As mentioned in previous posts, I use IPython.parallel on a shared batch 
cluster, where I submit ipengines as relatively short "batch jobs" for use by a 
Client.load_balanced_view(retries=..., chunksize=..., ordered=False). This 
gives me load-balanced, fault-tolerant (in particular if an engine job times 
out) computing of otherwise trivially parallel tasks. This is by far the most 
maintainable framework I've found, and it scales well to at least 100 
processors, or > 600 if I use several clusters. The limiting factor seems to be 
the number and latency of TCP connections.

I recently got kicked out from a batch cluster for failing to utilize their 
precious Infiniband, and for *possibly* competing with the batch system's use 
of TCP. (This was not further investigated, as that cluster was intended to 
fill other needs than my rather-trivially-parallel computing, and so they 
didn't really want me around anyway.)

Now, I know next to nothing about what Infiniband is, but googling suggested 
that TCP can be run over Infiniband.


I wonder if that could improve the latency of IPython.parallel tasks, while 
letting me be less of a nuisance to the batch cluster admins. Any hints on 
whether and how this can be achieved would be most appreciated.

(I have mostly heard about Infiniband in connection with MPI. However, MPI 
doesn't seem to fit my needs because 1) all MPI processes need to start and 
stop at the same time, whereas I wish to use as many processors as happen to be 
available, without specifying the number in advance, 2) the ipcluster cannot 
use MPI for coordination, and 3) I wish to distribute tasks and results using a 
load_balanced_view() and not explicitly over MPI.)

Best regards,
Jon Olav

More information about the IPython-User mailing list