[SciPy-user] Multithreading cookbook entry

Bruce Southey bsouthey@gmail....
Thu Feb 21 12:33:18 CST 2008


Hi,

Removing the expy(y) gives about the same time, which you can take
either way. But really you need to understand Python.

>From http://docs.python.org/api/threads.html:
"Therefore, the rule exists that only the thread that has acquired the
global interpreter lock may operate on Python objects or call Python/C
API functions. In order to support multi-threaded Python programs, the
interpreter regularly releases and reacquires the lock -- by default,
every 100 bytecode instructions (this can be changed with
sys.setcheckinterval()). "

If the operation is fast enough, then it will be done before the lock
is released by the interpreter does can release and reacquire the
lock. Thus there is no advantage in threading as in this case. So by
doing more work, this release/reacquire action becomes more important
to the overall performance.

This feature is also part of the reason why you can not get a linear
speedup for this using Python.

It is better to set the number of threads in handythread.py:
N threads      Ratio of handythread.py to a for loop
   1                 0.995360257543
   2                 1.81112657674
   3                 2.51939329739
   4                 2.95551097958
   5                 3.04222213598

I do not get 100% of cpu time of each processor even for the for-loop
part. So until that happens, threads are not going to be as good as
they could be. Also, I can not comment on the OS but I do know some
are better than others for threading performance.

Regards
Bruce




On Thu, Feb 21, 2008 at 11:37 AM, Anand Patil
<anand.prabhakar.patil@gmail.com> wrote:
> Bruce,
>
>
>  >  from numpy import ones, exp
>  >  import time
>  >
>  >  if __name__=='__main__':
>  >     def f(x):
>  >
>  >         y = ones(10000000)
>  >         exp(y)
>  >     t1=time.time()
>  >     foreach(f,range(100))
>  >     t2=time.time()
>  >     for ndx in range(100):
>  >
>  >         y = ones(10000000)
>  >         exp(y)
>  >     t3=time.time()
>  >     print 'Handythread / simple loop)=, (t3-t2)/(t2-t1)
>  >
>  >  With this code, the 'for loop' takes about 2.7 times as long as the
>  >  handythread loop for a quad-core system.
>
>  That's very interesting. I set the 'threads' option to 2, since I have
>  a dual-core system, and the handythread example is still only about
>  1.5x faster than the for-loop example, even though I can see that both
>  my cores are being fully utilized. That could be because my machine
>  devotes a good fraction of one of its cores to just being a Mac, but
>  it doesn't look like that's what is making the difference.
>
>
>  The strange thing is that for me the 'for-loop' version above takes
>  67s, whereas a version with f modified as follows:
>
>
>  def f(x):
>      y = ones(10000000)
>      # exp(y)
>
>  takes 13s whether I use handythread or a for-loop. I think that means
>  'ones' can only be executed by one thread at a time. Based on that, if
>  my machine had three free cores I would expect about a 2.16X speedup
>  tops, but you're seeing a 2.7X speedup.
>
>  That means our machines are doing something differently (yours is
>  better). Do you see any speedup from handythread with the modified
>  version of f?
>
>
>
>  Anand
>  _______________________________________________
>  SciPy-user mailing list
>  SciPy-user@scipy.org
>  http://projects.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-user mailing list