[IPython-User] ipcontroller failover?

Darren Govoni darren@ontrenet....
Sun Feb 12 15:02:47 CST 2012


Correct me if I'm wrong, but do the ipengines 'connect' or otherwise
announce their presence to the controller? If it were the other way
around, then this would accommodate some degree of fault tolerance for
the controller because it could be restarted by a watching dog and the
re-establish the connected state of the cluster. i.e. a controller comes
online. a pub/sub message is sent to a known channel and clients or
engines add the new ipcontroller to its internal list as a failover
endpoint.

On Sun, 2012-02-12 at 12:06 -0800, MinRK wrote:
> 
> 
> On Sun, Feb 12, 2012 at 11:48, Darren Govoni <darren@ontrenet.com>
> wrote:
>         On Sun, 2012-02-12 at 11:12 -0800, MinRK wrote:
>         >
>         >
>         > On Sun, Feb 12, 2012 at 10:42, Darren Govoni
>         <darren@ontrenet.com>
>         > wrote:
>         >         Thanks Min,
>         >
>         >         Is it possible to open a ticket for this capability
>         for a
>         >         (near) future
>         >         release? It compliments that already amazing load
>         balancing
>         >         capability.
>         >
>         >
>         > You are welcome to open an Issue.  I don't know if it will
>         make it
>         > into one of the next few releases, but it is on my todo
>         list.  The
>         > best way to get this sort of thing going is to start with a
>         Pull
>         > Request.
>         
>         
>         Ok, I will open an issue. Thanks. In the meantime, is it
>         possible for
>         clients to 'know' when a controller is no longer available?
>         For example,
>         it would be nice if I can insert a callback handler for this
>         sort of
>         internal exception so I can provide some graceful recovery
>         options.
> 
> 
> It would be sensible to add a heartbeat mechanism on the
> controller->client PUB channel for this information.  Until then, your
> main controller crash detection is going to be simple timeouts.
> 
> 
> ZeroMQ makes disconnect detection a challenge (because there are no
> disconnect events, because a disconnected channel is still valid, as
> the peer is allowed to just come back up).
>  
>         
>         >
>         >
>         >         Perhaps a related but separate notion would be the
>         ability to
>         >         have
>         >         clustered controllers for HA.
>         >
>         >
>         > I do have a model in mind for this sort of thing, though not
>         multiple
>         > *controllers*, rather multiple Schedulers.  Our design with
>         0MQ would
>         > make this pretty simple (just start another scheduler, and
>         make an
>         > extra call to socket.connect() on the Client and Engine is
>         all that's
>         > needed), and this should allow scaling to tens of thousands
>         of
>         > engines.
>         
>         
>         Yes! That's what I'm after. In this cloud-scale age of
>         computing, that
>         would be ideal.
>         
>         
>         Thanks Min.
>         
>         >
>         >
>         >         On Sun, 2012-02-12 at 08:32 -0800, Min RK wrote:
>         >         > No, there is no failover mechanism.  When the
>         controller
>         >         goes down, further requests will simply hang.  We
>         have almost
>         >         all the information we need to bring up a new
>         controller in
>         >         its place (restart it), in which case the Client
>         wouldn't even
>         >         need to know that it went down, and would continue
>         to just
>         >         work, thanks to some zeromq magic.
>         >         >
>         >         > -MinRK
>         >         >
>         >         > On Feb 12, 2012, at 5:02, Darren Govoni
>         >         <darren@ontrenet.com> wrote:
>         >         >
>         >         > > Hi,
>         >         > >  Does ipython support any kind of clustering or
>         failover
>         >         for
>         >         > > ipcontrollers? I'm wondering how situations are
>         handled
>         >         where a
>         >         > > controller goes down when a client needs to
>         perform
>         >         something.
>         >         > >
>         >         > > thanks for any tips.
>         >         > > Darren
>         >         > >
>         >         > > _______________________________________________
>         >         > > IPython-User mailing list
>         >         > > IPython-User@scipy.org
>         >         > >
>         http://mail.scipy.org/mailman/listinfo/ipython-user
>         >         > _______________________________________________
>         >         > IPython-User mailing list
>         >         > IPython-User@scipy.org
>         >         >
>         http://mail.scipy.org/mailman/listinfo/ipython-user
>         >
>         >
>         >         _______________________________________________
>         >         IPython-User mailing list
>         >         IPython-User@scipy.org
>         >         http://mail.scipy.org/mailman/listinfo/ipython-user
>         >
>         >
>         > _______________________________________________
>         > IPython-User mailing list
>         > IPython-User@scipy.org
>         > http://mail.scipy.org/mailman/listinfo/ipython-user
>         
>         
>         _______________________________________________
>         IPython-User mailing list
>         IPython-User@scipy.org
>         http://mail.scipy.org/mailman/listinfo/ipython-user
>         
> 
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user




More information about the IPython-User mailing list