[IPython-User] IPython on EC2 with hundreds of cores

Jon Olav Vik jonovik@gmail....
Tue Sep 25 02:31:04 CDT 2012


Robert Nishihara <robertnishihara <at> gmail.com> writes:

> Does anyone have experience running IPython on Amazon's EC2 service 
> with hundreds of cores? I am seeing a problem that only occurs when 
> I try to start a cluster with a large number of engines (over 500).

Not on EC2, but I have had similar experiences on a shared batch cluster 
running PBS (portable batch system).

> However, if I try the same thing with 17 or so nodes, 
> the cluster still starts fine, but the command
> 
>     rc = parallel.Client(packer='pickle')
> 
> times out. I think the engine registration may be timing out.

Yes. See below for how I have modified my configuration. It seems to 
me that the default timeouts are suitable for a single computer, but 
need to be increased to several minutes for the case when hundreds 
of engines connect over a network that may be under high load. I 
have also slowed down heartbeating, hoping that this would reduce 
traffic. (Haven't got around to writing a proper benchmark to optimize the 
various settings, sorry.)

Hope this helps,
Jon Olav


for i in *.py; do echo; echo $i; diff --context=3 $i.bak $i; done

ipcluster_config.py
*** ipcluster_config.py.bak     2012-09-12 14:50:52.000000000 +0200
--- ipcluster_config.py 2012-09-12 15:39:44.000000000 +0200
***************
*** 141,146 ****
--- 141,147 ----
  # Daemonize the ipcluster program. This implies --log-to-file. Not available 
on
  # Windows.
  # c.IPClusterStart.daemonize = False
+ c.IPClusterStart.daemonize = True

  # The Logging format template
  # c.IPClusterStart.log_format = '[%(name)s] %(message)s'

ipcontroller_config.py
*** ipcontroller_config.py.bak  2012-09-12 15:36:44.000000000 +0200
--- ipcontroller_config.py      2012-09-13 09:20:42.510347000 +0200
***************
*** 91,96 ****
--- 91,97 ----

  # whether to cleanup old logfiles before starting
  # c.IPControllerApp.clean_logs = False
+ c.IPControllerApp.clean_logs = True

  # The Logging format template
  # c.IPControllerApp.log_format = '[%(name)s] %(message)s'
***************
*** 287,292 ****
--- 288,294 ----
  # submitting many heterogenous tasks all at once.  Any positive value greater
  # than one is a compromise between the two.
  # c.TaskScheduler.hwm = 1
+ c.TaskScheduler.hwm = 3

  #-----------------------------------------------------------------------------
-
  # HeartMonitor configuration
***************
*** 297,302 ****
--- 299,305 ----

  # The frequency at which the Hub pings the engines for heartbeats (in ms)
  # c.HeartMonitor.period = 3000
+ c.HeartMonitor.period = 30000

  #-----------------------------------------------------------------------------
-
  # SQLiteDB configuration

ipengine_config.py
*** ipengine_config.py.bak      2012-09-12 14:50:52.000000000 +0200
--- ipengine_config.py  2012-09-12 15:39:57.000000000 +0200
***************
*** 37,42 ****
--- 37,43 ----

  # whether to cleanup old logfiles before starting
  # c.IPEngineApp.clean_logs = False
+ c.IPEngineApp.clean_logs = True

  # String id to add to runtime files, to prevent name collisions when using
  # multiple clusters with a single profile simultaneously.
***************
*** 83,88 ****
--- 84,90 ----
  # started at the same time and it may take a moment for the controller to 
write
  # the connector files.
  # c.IPEngineApp.wait_for_url_file = 5
+ c.IPEngineApp.wait_for_url_file = 60

  #-----------------------------------------------------------------------------
-
  # ProfileDir configuration
***************
*** 215,220 ****
--- 217,223 ----
  # The time (in seconds) to wait for the Controller to respond to registration
  # requests before giving up.
  # c.EngineFactory.timeout = 5
+ c.EngineFactory.timeout = 120

  # The SSH server to use for tunneling connections to the Controller.
  # c.EngineFactory.sshserver = u''

iplogger_config.py
diff: iplogger_config.py.bak: No such file or directory

ipython_config.py
diff: ipython_config.py.bak: No such file or directory

ipython_notebook_config.py
diff: ipython_notebook_config.py.bak: No such file or directory



More information about the IPython-User mailing list