[IPython-User] IPython.parallel over Infiniband?
Fri Sep 7 15:12:26 CDT 2012
This is probbaly a question for zeromq-dev, for how one might use
infiniband with zeromq.
There are benchmarks for zeromq tcp-over-infiniband, so presumably it is
possible, though it may require some flags to be set when building libzmq
itself (I have no idea).
Once you know how to use zmq+tcp over ib, then there shouldn't be anything
IPython needs to be aware of.
It's also possible that it's as simple as specifying a particular IP
(again, I have no idea). If the infiniband interconnect refers to a
particular interface on the node, then it should simply be a matter of
passing `ipcontroller --ip=<220.127.116.11>`.
On Fri, Sep 7, 2012 at 4:21 AM, Jon Olav Vik <email@example.com> wrote:
> Short version: Can the IPython.parallel ipcontroller and ipengines use
> Infiniband for communication?
> As mentioned in p.revious posts, I use IPython.parallel on a shared batch
> cluster, where I submit ipengines as relatively short "batch jobs" for use
> by a
> Client.load_balanced_view(retries=..., chunksize=..., ordered=False). This
> gives me load-balanced, fault-tolerant (in particular if an engine job
> out) computing of otherwise trivially parallel tasks. This is by far the
> maintainable framework I've found, and it scales well to at least 100
> processors, or > 600 if I use several clusters. The limiting factor seems
> to be
> the number and latency of TCP connections.
> I recently got kicked out from a batch cluster for failing to utilize their
> precious Infiniband, and for *possibly* competing with the batch system's
> of TCP. (This was not further investigated, as that cluster was intended to
> fill other needs than my rather-trivially-parallel computing, and so they
> didn't really want me around anyway.)
> Now, I know next to nothing about what Infiniband is, but googling
> that TCP can be run over Infiniband.
> I wonder if that could improve the latency of IPython.parallel tasks, while
> letting me be less of a nuisance to the batch cluster admins. Any hints on
> whether and how this can be achieved would be most appreciated.
> (I have mostly heard about Infiniband in connection with MPI. However, MPI
> doesn't seem to fit my needs because 1) all MPI processes need to start and
> stop at the same time, whereas I wish to use as many processors as happen
> to be
> available, without specifying the number in advance, 2) the ipcluster
> use MPI for coordination, and 3) I wish to distribute tasks and results
> using a
> load_balanced_view() and not explicitly over MPI.)
> Best regards,
> Jon Olav
> IPython-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User