[IPython-User] Question about schedulers

Darren Govoni darren@ontrenet....
Wed Jun 6 18:52:57 CDT 2012


Jon,
   Thanks for those details. Very informative.

So it says multiple tasks can be assigned to an engine at a time, but
how many execute at the same time? Just one right? Or is there a setting
for that too?

thanks!
Darren

On Wed, 2012-06-06 at 21:38 +0000, Jon Olav Vik wrote:
> Darren Govoni <darren <at> ontrenet.com> writes:
> 
> > Assuming all engines are equal, will the first 10 objects be
> > distributed to 1 engine each and the second 10 objects will wait for an
> > engine to be free then go there? Or will all 20 messages be spread to
> > the engines at the same time?
> 
> I think two relevant options are:
> 
> 
> The `chunksize` argument to IPython.parallel.ParallelFunction determines how 
> many list items are passed in each "task".
> 
> from IPython.parallel import Client
> c = Client()
> lv = c.load_balanced_view()
> 
> @lv.parallel(block=True)
> def chunk1(x):
>     return str(x)
> 
> @lv.parallel(chunksize=2, block=True)
> def chunk2(x):
>     return str(x)
> 
> L = range(5)
> print chunk1(L)
> print chunk2(L)
> ## -- End pasted text --
> ['[0]', '[1]', '[2]', '[3]', '[4]']
> ['[0, 1]', '[2, 3]', '[4]']
> 
> 
> The `hwm` (high water mark) configurable determines the maximum number of tasks 
> that can be outstanding on an engine. On my system, it is set in the file 
> ipcontroller_config.py, inside the directory profile_default inside the 
> directory returned by IPython.utils.path.get_ipython_dir().
> 
> Quoting
> http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#greedy-assignment
> 
> """
> Tasks are assigned greedily as they are submitted. If their dependencies are 
> met, they will be assigned to an engine right away, and multiple tasks can be 
> assigned to an engine at a given time. This limit is set with the 
> TaskScheduler.hwm (high water mark) configurable:
> # the most common choices are:
> c.TaskSheduler.hwm = 0 # (minimal latency, default in IPython ≤ 0.12)
> # or
> c.TaskScheduler.hwm = 1 # (most-informed balancing, default in > 0.12)
> 
> In IPython ≤ 0.12,the default is 0, or no-limit. That is, there is no limit to 
> the number of tasks that can be outstanding on a given engine. This greatly 
> benefits the latency of execution, because network traffic can be hidden behind 
> computation. However, this means that workload is assigned without knowledge of 
> how long each task might take, and can result in poor load-balancing, 
> particularly for submitting a collection of heterogeneous tasks all at once. 
> You can limit this effect by setting hwm to a positive integer, 1 being maximum 
> load-balancing (a task will never be waiting if there is an idle engine), and 
> any larger number being a compromise between load-balance and latency-hiding.
> 
> In practice, some users have been confused by having this optimization on by 
> default, and the default value has been changed to 1. This can be slower, but 
> has more obvious behavior and won’t result in assigning too many tasks to some 
> engines in heterogeneous cases.
> """
> 
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user




More information about the IPython-User mailing list