<br><br><div class="gmail_quote">On Wed, Jun 6, 2012 at 4:52 PM, Darren Govoni <span dir="ltr"><<a href="mailto:darren@ontrenet.com" target="_blank">darren@ontrenet.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Jon,<br>
Thanks for those details. Very informative.<br>
<br>
So it says multiple tasks can be assigned to an engine at a time, but<br>
how many execute at the same time? Just one right? Or is there a setting<br>
for that too?<br></blockquote><div><br></div><div>Correct, the engines themselves are not multithreaded, so it only runs one at a time. This is not configurable. The normal mode is starting one engine per core on each machine.</div>
<div><br></div><div>Assigning multiple tasks to the engines helps hide the network latency behind computation, because the next task will be waiting in-memory on the Engine when it finishes the previous one, rather than having to fetch it from the scheduler.</div>
<div><br></div><div>-MinRK</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
thanks!<br>
<span class="HOEnZb"><font color="#888888">Darren<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Wed, 2012-06-06 at 21:38 +0000, Jon Olav Vik wrote:<br>
> Darren Govoni <darren <at> <a href="http://ontrenet.com" target="_blank">ontrenet.com</a>> writes:<br>
><br>
> > Assuming all engines are equal, will the first 10 objects be<br>
> > distributed to 1 engine each and the second 10 objects will wait for an<br>
> > engine to be free then go there? Or will all 20 messages be spread to<br>
> > the engines at the same time?<br>
><br>
> I think two relevant options are:<br>
><br>
><br>
> The `chunksize` argument to IPython.parallel.ParallelFunction determines how<br>
> many list items are passed in each "task".<br>
><br>
> from IPython.parallel import Client<br>
> c = Client()<br>
> lv = c.load_balanced_view()<br>
><br>
> @lv.parallel(block=True)<br>
> def chunk1(x):<br>
> return str(x)<br>
><br>
> @lv.parallel(chunksize=2, block=True)<br>
> def chunk2(x):<br>
> return str(x)<br>
><br>
> L = range(5)<br>
> print chunk1(L)<br>
> print chunk2(L)<br>
> ## -- End pasted text --<br>
> ['[0]', '[1]', '[2]', '[3]', '[4]']<br>
> ['[0, 1]', '[2, 3]', '[4]']<br>
><br>
><br>
> The `hwm` (high water mark) configurable determines the maximum number of tasks<br>
> that can be outstanding on an engine. On my system, it is set in the file<br>
> ipcontroller_config.py, inside the directory profile_default inside the<br>
> directory returned by IPython.utils.path.get_ipython_dir().<br>
><br>
> Quoting<br>
> <a href="http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#greedy-assignment" target="_blank">http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#greedy-assignment</a><br>
><br>
> """<br>
> Tasks are assigned greedily as they are submitted. If their dependencies are<br>
> met, they will be assigned to an engine right away, and multiple tasks can be<br>
> assigned to an engine at a given time. This limit is set with the<br>
> TaskScheduler.hwm (high water mark) configurable:<br>
> # the most common choices are:<br>
> c.TaskSheduler.hwm = 0 # (minimal latency, default in IPython ≤ 0.12)<br>
> # or<br>
> c.TaskScheduler.hwm = 1 # (most-informed balancing, default in > 0.12)<br>
><br>
> In IPython ≤ 0.12,the default is 0, or no-limit. That is, there is no limit to<br>
> the number of tasks that can be outstanding on a given engine. This greatly<br>
> benefits the latency of execution, because network traffic can be hidden behind<br>
> computation. However, this means that workload is assigned without knowledge of<br>
> how long each task might take, and can result in poor load-balancing,<br>
> particularly for submitting a collection of heterogeneous tasks all at once.<br>
> You can limit this effect by setting hwm to a positive integer, 1 being maximum<br>
> load-balancing (a task will never be waiting if there is an idle engine), and<br>
> any larger number being a compromise between load-balance and latency-hiding.<br>
><br>
> In practice, some users have been confused by having this optimization on by<br>
> default, and the default value has been changed to 1. This can be slower, but<br>
> has more obvious behavior and won’t result in assigning too many tasks to some<br>
> engines in heterogeneous cases.<br>
> """<br>
><br>
> _______________________________________________<br>
> IPython-User mailing list<br>
> <a href="mailto:IPython-User@scipy.org">IPython-User@scipy.org</a><br>
> <a href="http://mail.scipy.org/mailman/listinfo/ipython-user" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-user</a><br>
<br>
<br>
_______________________________________________<br>
IPython-User mailing list<br>
<a href="mailto:IPython-User@scipy.org">IPython-User@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/ipython-user" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-user</a><br>
</div></div></blockquote></div><br>