[IPython-User] Question about schedulers
Wed Jun 6 18:52:57 CDT 2012
Thanks for those details. Very informative.
So it says multiple tasks can be assigned to an engine at a time, but
how many execute at the same time? Just one right? Or is there a setting
for that too?
On Wed, 2012-06-06 at 21:38 +0000, Jon Olav Vik wrote:
> Darren Govoni <darren <at> ontrenet.com> writes:
> > Assuming all engines are equal, will the first 10 objects be
> > distributed to 1 engine each and the second 10 objects will wait for an
> > engine to be free then go there? Or will all 20 messages be spread to
> > the engines at the same time?
> I think two relevant options are:
> The `chunksize` argument to IPython.parallel.ParallelFunction determines how
> many list items are passed in each "task".
> from IPython.parallel import Client
> c = Client()
> lv = c.load_balanced_view()
> def chunk1(x):
> return str(x)
> @lv.parallel(chunksize=2, block=True)
> def chunk2(x):
> return str(x)
> L = range(5)
> print chunk1(L)
> print chunk2(L)
> ## -- End pasted text --
> ['', '', '', '', '']
> ['[0, 1]', '[2, 3]', '']
> The `hwm` (high water mark) configurable determines the maximum number of tasks
> that can be outstanding on an engine. On my system, it is set in the file
> ipcontroller_config.py, inside the directory profile_default inside the
> directory returned by IPython.utils.path.get_ipython_dir().
> Tasks are assigned greedily as they are submitted. If their dependencies are
> met, they will be assigned to an engine right away, and multiple tasks can be
> assigned to an engine at a given time. This limit is set with the
> TaskScheduler.hwm (high water mark) configurable:
> # the most common choices are:
> c.TaskSheduler.hwm = 0 # (minimal latency, default in IPython ≤ 0.12)
> # or
> c.TaskScheduler.hwm = 1 # (most-informed balancing, default in > 0.12)
> In IPython ≤ 0.12,the default is 0, or no-limit. That is, there is no limit to
> the number of tasks that can be outstanding on a given engine. This greatly
> benefits the latency of execution, because network traffic can be hidden behind
> computation. However, this means that workload is assigned without knowledge of
> how long each task might take, and can result in poor load-balancing,
> particularly for submitting a collection of heterogeneous tasks all at once.
> You can limit this effect by setting hwm to a positive integer, 1 being maximum
> load-balancing (a task will never be waiting if there is an idle engine), and
> any larger number being a compromise between load-balance and latency-hiding.
> In practice, some users have been confused by having this optimization on by
> default, and the default value has been changed to 1. This can be slower, but
> has more obvious behavior and won’t result in assigning too many tasks to some
> engines in heterogeneous cases.
> IPython-User mailing list
More information about the IPython-User