[IPython-User] Question about schedulers
Jon Olav Vik
jonovik@gmail....
Wed Jun 6 16:38:41 CDT 2012
Darren Govoni <darren <at> ontrenet.com> writes:
> Assuming all engines are equal, will the first 10 objects be
> distributed to 1 engine each and the second 10 objects will wait for an
> engine to be free then go there? Or will all 20 messages be spread to
> the engines at the same time?
I think two relevant options are:
The `chunksize` argument to IPython.parallel.ParallelFunction determines how
many list items are passed in each "task".
from IPython.parallel import Client
c = Client()
lv = c.load_balanced_view()
@lv.parallel(block=True)
def chunk1(x):
return str(x)
@lv.parallel(chunksize=2, block=True)
def chunk2(x):
return str(x)
L = range(5)
print chunk1(L)
print chunk2(L)
## -- End pasted text --
['[0]', '[1]', '[2]', '[3]', '[4]']
['[0, 1]', '[2, 3]', '[4]']
The `hwm` (high water mark) configurable determines the maximum number of tasks
that can be outstanding on an engine. On my system, it is set in the file
ipcontroller_config.py, inside the directory profile_default inside the
directory returned by IPython.utils.path.get_ipython_dir().
Quoting
http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#greedy-assignment
"""
Tasks are assigned greedily as they are submitted. If their dependencies are
met, they will be assigned to an engine right away, and multiple tasks can be
assigned to an engine at a given time. This limit is set with the
TaskScheduler.hwm (high water mark) configurable:
# the most common choices are:
c.TaskSheduler.hwm = 0 # (minimal latency, default in IPython ≤ 0.12)
# or
c.TaskScheduler.hwm = 1 # (most-informed balancing, default in > 0.12)
In IPython ≤ 0.12,the default is 0, or no-limit. That is, there is no limit to
the number of tasks that can be outstanding on a given engine. This greatly
benefits the latency of execution, because network traffic can be hidden behind
computation. However, this means that workload is assigned without knowledge of
how long each task might take, and can result in poor load-balancing,
particularly for submitting a collection of heterogeneous tasks all at once.
You can limit this effect by setting hwm to a positive integer, 1 being maximum
load-balancing (a task will never be waiting if there is an idle engine), and
any larger number being a compromise between load-balance and latency-hiding.
In practice, some users have been confused by having this optimization on by
default, and the default value has been changed to 1. This can be slower, but
has more obvious behavior and won’t result in assigning too many tasks to some
engines in heterogeneous cases.
"""
More information about the IPython-User
mailing list