[IPython-User] Question about schedulers

Darren Govoni darren@ontrenet....
Wed Jun 6 20:38:05 CDT 2012


Gotcha. Makes sense.

Incidentally, I discovered that I can execute the ipengine code directly
in my python IDE and set break points in my user code modules and when I
execute functions from remote clients/views, it will hit the break
points and let me debug my code visually (in the running engine). Pretty
sweet. Though I'd share.

On Wed, 2012-06-06 at 17:06 -0700, MinRK wrote:
> 
> 
> On Wed, Jun 6, 2012 at 4:52 PM, Darren Govoni <darren@ontrenet.com>
> wrote:
>         Jon,
>           Thanks for those details. Very informative.
>         
>         So it says multiple tasks can be assigned to an engine at a
>         time, but
>         how many execute at the same time? Just one right? Or is there
>         a setting
>         for that too?
> 
> 
> Correct, the engines themselves are not multithreaded, so it only runs
> one at a time.  This is not configurable.  The normal mode is starting
> one engine per core on each machine.
> 
> 
> Assigning multiple tasks to the engines helps hide the network latency
> behind computation, because the next task will be waiting in-memory on
> the Engine when it finishes the previous one, rather than having to
> fetch it from the scheduler.
> 
> 
> -MinRK
>  
>         
>         thanks!
>         Darren
>         
>         On Wed, 2012-06-06 at 21:38 +0000, Jon Olav Vik wrote:
>         > Darren Govoni <darren <at> ontrenet.com> writes:
>         >
>         > > Assuming all engines are equal, will the first 10 objects
>         be
>         > > distributed to 1 engine each and the second 10 objects
>         will wait for an
>         > > engine to be free then go there? Or will all 20 messages
>         be spread to
>         > > the engines at the same time?
>         >
>         > I think two relevant options are:
>         >
>         >
>         > The `chunksize` argument to
>         IPython.parallel.ParallelFunction determines how
>         > many list items are passed in each "task".
>         >
>         > from IPython.parallel import Client
>         > c = Client()
>         > lv = c.load_balanced_view()
>         >
>         > @lv.parallel(block=True)
>         > def chunk1(x):
>         >     return str(x)
>         >
>         > @lv.parallel(chunksize=2, block=True)
>         > def chunk2(x):
>         >     return str(x)
>         >
>         > L = range(5)
>         > print chunk1(L)
>         > print chunk2(L)
>         > ## -- End pasted text --
>         > ['[0]', '[1]', '[2]', '[3]', '[4]']
>         > ['[0, 1]', '[2, 3]', '[4]']
>         >
>         >
>         > The `hwm` (high water mark) configurable determines the
>         maximum number of tasks
>         > that can be outstanding on an engine. On my system, it is
>         set in the file
>         > ipcontroller_config.py, inside the directory profile_default
>         inside the
>         > directory returned by IPython.utils.path.get_ipython_dir().
>         >
>         > Quoting
>         >
>         http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#greedy-assignment
>         >
>         > """
>         > Tasks are assigned greedily as they are submitted. If their
>         dependencies are
>         > met, they will be assigned to an engine right away, and
>         multiple tasks can be
>         > assigned to an engine at a given time. This limit is set
>         with the
>         > TaskScheduler.hwm (high water mark) configurable:
>         > # the most common choices are:
>         > c.TaskSheduler.hwm = 0 # (minimal latency, default in
>         IPython ≤ 0.12)
>         > # or
>         > c.TaskScheduler.hwm = 1 # (most-informed balancing, default
>         in > 0.12)
>         >
>         > In IPython ≤ 0.12,the default is 0, or no-limit. That is,
>         there is no limit to
>         > the number of tasks that can be outstanding on a given
>         engine. This greatly
>         > benefits the latency of execution, because network traffic
>         can be hidden behind
>         > computation. However, this means that workload is assigned
>         without knowledge of
>         > how long each task might take, and can result in poor
>         load-balancing,
>         > particularly for submitting a collection of heterogeneous
>         tasks all at once.
>         > You can limit this effect by setting hwm to a positive
>         integer, 1 being maximum
>         > load-balancing (a task will never be waiting if there is an
>         idle engine), and
>         > any larger number being a compromise between load-balance
>         and latency-hiding.
>         >
>         > In practice, some users have been confused by having this
>         optimization on by
>         > default, and the default value has been changed to 1. This
>         can be slower, but
>         > has more obvious behavior and won’t result in assigning too
>         many tasks to some
>         > engines in heterogeneous cases.
>         > """
>         >
>         > _______________________________________________
>         > IPython-User mailing list
>         > IPython-User@scipy.org
>         > http://mail.scipy.org/mailman/listinfo/ipython-user
>         
>         
>         _______________________________________________
>         IPython-User mailing list
>         IPython-User@scipy.org
>         http://mail.scipy.org/mailman/listinfo/ipython-user
>         
> 
> _______________________________________________
> IPython-User mailing list
> IPython-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-user




More information about the IPython-User mailing list