[IPython-User] Question about schedulers

Jon Olav Vik jonovik@gmail....
Wed Jun 6 16:38:41 CDT 2012


Darren Govoni <darren <at> ontrenet.com> writes:

> Assuming all engines are equal, will the first 10 objects be
> distributed to 1 engine each and the second 10 objects will wait for an
> engine to be free then go there? Or will all 20 messages be spread to
> the engines at the same time?

I think two relevant options are:


The `chunksize` argument to IPython.parallel.ParallelFunction determines how 
many list items are passed in each "task".

from IPython.parallel import Client
c = Client()
lv = c.load_balanced_view()

@lv.parallel(block=True)
def chunk1(x):
    return str(x)

@lv.parallel(chunksize=2, block=True)
def chunk2(x):
    return str(x)

L = range(5)
print chunk1(L)
print chunk2(L)
## -- End pasted text --
['[0]', '[1]', '[2]', '[3]', '[4]']
['[0, 1]', '[2, 3]', '[4]']


The `hwm` (high water mark) configurable determines the maximum number of tasks 
that can be outstanding on an engine. On my system, it is set in the file 
ipcontroller_config.py, inside the directory profile_default inside the 
directory returned by IPython.utils.path.get_ipython_dir().

Quoting
http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#greedy-assignment

"""
Tasks are assigned greedily as they are submitted. If their dependencies are 
met, they will be assigned to an engine right away, and multiple tasks can be 
assigned to an engine at a given time. This limit is set with the 
TaskScheduler.hwm (high water mark) configurable:
# the most common choices are:
c.TaskSheduler.hwm = 0 # (minimal latency, default in IPython ≤ 0.12)
# or
c.TaskScheduler.hwm = 1 # (most-informed balancing, default in > 0.12)

In IPython ≤ 0.12,the default is 0, or no-limit. That is, there is no limit to 
the number of tasks that can be outstanding on a given engine. This greatly 
benefits the latency of execution, because network traffic can be hidden behind 
computation. However, this means that workload is assigned without knowledge of 
how long each task might take, and can result in poor load-balancing, 
particularly for submitting a collection of heterogeneous tasks all at once. 
You can limit this effect by setting hwm to a positive integer, 1 being maximum 
load-balancing (a task will never be waiting if there is an idle engine), and 
any larger number being a compromise between load-balance and latency-hiding.

In practice, some users have been confused by having this optimization on by 
default, and the default value has been changed to 1. This can be slower, but 
has more obvious behavior and won’t result in assigning too many tasks to some 
engines in heterogeneous cases.
"""



More information about the IPython-User mailing list