[SciPy-User] Speeding things up - how to use more than one computer core

Troels Emtekær Linnet tlinnet@gmail....
Sun Apr 7 10:15:42 CDT 2013


Thank you for all your answers. :-)

I think I am fit to understand and try some things now.

Best
Troels


2013/4/7 Daπid <davidmenhur@gmail.com>

> This benchmark is poor because you are not taking into account many things
> that will happen in your real case. A quick glance at your code tells me
> (correct me if I am wrong) that you are doing some partial fitting (I think
> this is your parallelization target), and then a global fit of some sort. I
> don't know about these particular functions you are using, but you must be
> aware that several NumPy functions have a lot of optimizations under the
> hood, like automatic parallelization, and so on. Also, a very important
> issue here, specially having so many cores, is feeding data to the CPU:
> probably, a fair share of your computing time is spent with the CPU waiting
> for data to come in.
>
> The performance of a Python program is quite unpredictable, as there are
> so many things going on. I think the best thing you can do is to profile
> your code, see where are the bottlenecks, and try with the different
> parallel methods *on that block* which one works best. Consider also how
> difficult is to program and debug it, I have had hard times struggling with
> multiprocessing on a very simple program until I got it working.
>
> Regarding the difference between processes and threads: they are both
> executing in parallel, but a thread will be bounded to the Python GIL: only
> one line of Python will be executed at the time, but this does not apply to
> C code in NumPy, or system calls (waiting for data to be written to file).
> On the other hand, sharing data between threads is much cheaper than
> between processes. On the other hand, multiprocessing will trully execute
> them in parallel, using one core for each process, but creating a bigger
> overhead. I would say you want multiprocessing, but depending on how is
> time spent in your code, and how is NumPy releasing the GIL, you may
> actually get a better result with multithreading. Again, if you want to be
> sure, test it; but if your first try is good enough for you, you may as
> well leave it as it is.
>
> BTW, if you want to read more about memory and parallelization, take a
> look at Francesc Alted's fantastic talk on the Advanced Scientific Python
> Course: https://python.g-node.org/python-summerschool-2012/starving_cpu ,
> and apply if you can.
>
>
> David.
>
>
>
> On 7 April 2013 14:11, Troels Emtekær Linnet <tlinnet@gmail.com> wrote:
>
>> Thanks for pointing that out.
>> I did not understand the tuble way to call the function.
>>
>> But now I get these results:
>> Why is joblib so slow?
>> And should I go for threading or processes?
>>
>> -------------------------------
>> Method was normal
>> Done :0:00:00.040000
>> [9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
>> 9999.0] <type 'numpy.float64'>
>>
>> Method was multi Pool
>> Done :0:00:00.422000
>> [9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
>> 9999.0] <type 'numpy.float64'>
>>
>> Method was joblib delayed
>> Done :0:00:02.569000
>> [9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
>> 9999.0] <type 'numpy.float64'>
>>
>> Method was handythread
>> Done :0:00:00.582000
>> [9990.0, 9991.0, 9992.0, 9993.0, 9994.0, 9995.0, 9996.0, 9997.0, 9998.0,
>> 9999.0] <type 'numpy.float64'>
>>
>> ------------------------------------------------------------------
>>
>> import numpy as np
>> import multiprocessing
>> from multiprocessing import Pool
>>
>> from datetime import datetime
>> from joblib import Parallel, delayed
>> #
>> http://www.scipy.org/Cookbook/Multithreading?action=AttachFile&do=view&target=test_handythread.py
>> from handythread import foreach
>>
>> def getsqrt(n):
>>     res = np.sqrt(n**2)
>>     return(res)
>>
>> def main():
>>     jobs = multiprocessing.cpu_count()-1
>>     a = range(10000)
>>     for method in ['normal','multi Pool','joblib delayed','handythread']:
>>
>>         startTime = datetime.now()
>>         sprint=True
>>         if method=='normal':
>>             res = []
>>             for i in a:
>>                 b = getsqrt(i)
>>                 res.append(b)
>>         elif method=='multi Pool':
>>
>>             pool = Pool(processes=jobs)
>>             res = pool.map(getsqrt, a)
>>         elif method=='joblib delayed':
>>             res = Parallel(n_jobs=jobs)(delayed(getsqrt)(i) for i in a)
>>         elif method=='handythread':
>>             res = foreach(getsqrt,a,threads=jobs,return_=True)
>>
>>         else:
>>             sprint=False
>>         if sprint:
>>             print "Method was %s"%method
>>             print "Done :%s"%(datetime.now()-startTime)
>>             print res[-10:], type(res[-1])
>>     return(res)
>>
>> if __name__ == "__main__":
>>     res = main()
>>
>> Troels
>>
>> x@normalesup.org>
>> On Sun, Apr 07, 2013 at 12:17:59AM +0200, Troels Emtekær Linnet wrote:
>> > Method was joblib delayed
>> > Done :0:00:00
>>
>> Hum, this is fishy, isn't it?
>>
>> >         elif method=='joblib delayed':
>> >             Parallel(n_jobs=-2) #Can also use '-1' for all cores, '-2'
>> for all
>> > cores=-1
>> >             func,res = delayed(getsqrt), a
>>
>> I have a hard time reading your code, but it seems to me that you haven't
>> computed anything here, just instanciated to Parallel object.
>>
>> You need to do:
>>
>>     res = Parallel(n_jobs=-2)(delayed(getsqrt)(i) for i in a)
>>
>> I would expect joblib to be on the same order of magnitude speed-wise as
>> multiprocessing (hell, it's just a wrapper on multiprocessing). It's just
>> going to be more robust code than instanciating manually a Pool (deal
>> better with error, and optionally dispatching on-demand computation).
>>
>> Gaël
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130407/04c266b5/attachment.html 


More information about the SciPy-User mailing list