[IPython-User] Question about schedulers
Wed Jun 6 19:06:55 CDT 2012
On Wed, Jun 6, 2012 at 4:52 PM, Darren Govoni <email@example.com> wrote:
> Thanks for those details. Very informative.
> So it says multiple tasks can be assigned to an engine at a time, but
> how many execute at the same time? Just one right? Or is there a setting
> for that too?
Correct, the engines themselves are not multithreaded, so it only runs one
at a time. This is not configurable. The normal mode is starting one
engine per core on each machine.
Assigning multiple tasks to the engines helps hide the network latency
behind computation, because the next task will be waiting in-memory on the
Engine when it finishes the previous one, rather than having to fetch it
from the scheduler.
> On Wed, 2012-06-06 at 21:38 +0000, Jon Olav Vik wrote:
> > Darren Govoni <darren <at> ontrenet.com> writes:
> > > Assuming all engines are equal, will the first 10 objects be
> > > distributed to 1 engine each and the second 10 objects will wait for an
> > > engine to be free then go there? Or will all 20 messages be spread to
> > > the engines at the same time?
> > I think two relevant options are:
> > The `chunksize` argument to IPython.parallel.ParallelFunction determines
> > many list items are passed in each "task".
> > from IPython.parallel import Client
> > c = Client()
> > lv = c.load_balanced_view()
> > @lv.parallel(block=True)
> > def chunk1(x):
> > return str(x)
> > @lv.parallel(chunksize=2, block=True)
> > def chunk2(x):
> > return str(x)
> > L = range(5)
> > print chunk1(L)
> > print chunk2(L)
> > ## -- End pasted text --
> > ['', '', '', '', '']
> > ['[0, 1]', '[2, 3]', '']
> > The `hwm` (high water mark) configurable determines the maximum number
> of tasks
> > that can be outstanding on an engine. On my system, it is set in the file
> > ipcontroller_config.py, inside the directory profile_default inside the
> > directory returned by IPython.utils.path.get_ipython_dir().
> > Quoting
> > """
> > Tasks are assigned greedily as they are submitted. If their dependencies
> > met, they will be assigned to an engine right away, and multiple tasks
> can be
> > assigned to an engine at a given time. This limit is set with the
> > TaskScheduler.hwm (high water mark) configurable:
> > # the most common choices are:
> > c.TaskSheduler.hwm = 0 # (minimal latency, default in IPython ≤ 0.12)
> > # or
> > c.TaskScheduler.hwm = 1 # (most-informed balancing, default in > 0.12)
> > In IPython ≤ 0.12,the default is 0, or no-limit. That is, there is no
> limit to
> > the number of tasks that can be outstanding on a given engine. This
> > benefits the latency of execution, because network traffic can be hidden
> > computation. However, this means that workload is assigned without
> knowledge of
> > how long each task might take, and can result in poor load-balancing,
> > particularly for submitting a collection of heterogeneous tasks all at
> > You can limit this effect by setting hwm to a positive integer, 1 being
> > load-balancing (a task will never be waiting if there is an idle
> engine), and
> > any larger number being a compromise between load-balance and
> > In practice, some users have been confused by having this optimization
> on by
> > default, and the default value has been changed to 1. This can be
> slower, but
> > has more obvious behavior and won’t result in assigning too many tasks
> to some
> > engines in heterogeneous cases.
> > """
> > _______________________________________________
> > IPython-User mailing list
> > IPython-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/ipython-user
> IPython-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the IPython-User