[IPython-User] IPython on EC2 with hundreds of cores
Jon Olav Vik
jonovik@gmail....
Tue Sep 25 02:31:04 CDT 2012
Robert Nishihara <robertnishihara <at> gmail.com> writes:
> Does anyone have experience running IPython on Amazon's EC2 service
> with hundreds of cores? I am seeing a problem that only occurs when
> I try to start a cluster with a large number of engines (over 500).
Not on EC2, but I have had similar experiences on a shared batch cluster
running PBS (portable batch system).
> However, if I try the same thing with 17 or so nodes,
> the cluster still starts fine, but the command
>
> rc = parallel.Client(packer='pickle')
>
> times out. I think the engine registration may be timing out.
Yes. See below for how I have modified my configuration. It seems to
me that the default timeouts are suitable for a single computer, but
need to be increased to several minutes for the case when hundreds
of engines connect over a network that may be under high load. I
have also slowed down heartbeating, hoping that this would reduce
traffic. (Haven't got around to writing a proper benchmark to optimize the
various settings, sorry.)
Hope this helps,
Jon Olav
for i in *.py; do echo; echo $i; diff --context=3 $i.bak $i; done
ipcluster_config.py
*** ipcluster_config.py.bak 2012-09-12 14:50:52.000000000 +0200
--- ipcluster_config.py 2012-09-12 15:39:44.000000000 +0200
***************
*** 141,146 ****
--- 141,147 ----
# Daemonize the ipcluster program. This implies --log-to-file. Not available
on
# Windows.
# c.IPClusterStart.daemonize = False
+ c.IPClusterStart.daemonize = True
# The Logging format template
# c.IPClusterStart.log_format = '[%(name)s] %(message)s'
ipcontroller_config.py
*** ipcontroller_config.py.bak 2012-09-12 15:36:44.000000000 +0200
--- ipcontroller_config.py 2012-09-13 09:20:42.510347000 +0200
***************
*** 91,96 ****
--- 91,97 ----
# whether to cleanup old logfiles before starting
# c.IPControllerApp.clean_logs = False
+ c.IPControllerApp.clean_logs = True
# The Logging format template
# c.IPControllerApp.log_format = '[%(name)s] %(message)s'
***************
*** 287,292 ****
--- 288,294 ----
# submitting many heterogenous tasks all at once. Any positive value greater
# than one is a compromise between the two.
# c.TaskScheduler.hwm = 1
+ c.TaskScheduler.hwm = 3
#-----------------------------------------------------------------------------
-
# HeartMonitor configuration
***************
*** 297,302 ****
--- 299,305 ----
# The frequency at which the Hub pings the engines for heartbeats (in ms)
# c.HeartMonitor.period = 3000
+ c.HeartMonitor.period = 30000
#-----------------------------------------------------------------------------
-
# SQLiteDB configuration
ipengine_config.py
*** ipengine_config.py.bak 2012-09-12 14:50:52.000000000 +0200
--- ipengine_config.py 2012-09-12 15:39:57.000000000 +0200
***************
*** 37,42 ****
--- 37,43 ----
# whether to cleanup old logfiles before starting
# c.IPEngineApp.clean_logs = False
+ c.IPEngineApp.clean_logs = True
# String id to add to runtime files, to prevent name collisions when using
# multiple clusters with a single profile simultaneously.
***************
*** 83,88 ****
--- 84,90 ----
# started at the same time and it may take a moment for the controller to
write
# the connector files.
# c.IPEngineApp.wait_for_url_file = 5
+ c.IPEngineApp.wait_for_url_file = 60
#-----------------------------------------------------------------------------
-
# ProfileDir configuration
***************
*** 215,220 ****
--- 217,223 ----
# The time (in seconds) to wait for the Controller to respond to registration
# requests before giving up.
# c.EngineFactory.timeout = 5
+ c.EngineFactory.timeout = 120
# The SSH server to use for tunneling connections to the Controller.
# c.EngineFactory.sshserver = u''
iplogger_config.py
diff: iplogger_config.py.bak: No such file or directory
ipython_config.py
diff: ipython_config.py.bak: No such file or directory
ipython_notebook_config.py
diff: ipython_notebook_config.py.bak: No such file or directory
More information about the IPython-User
mailing list