[Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

Raul Cota raul@virtualmaterials....
Mon Dec 3 09:35:58 CST 2012


On 02/12/2012 8:31 PM, Travis Oliphant wrote:
> Raul,
>
> This is *fantastic work*.     While many optimizations were done 6 years ago as people started to convert their code, that kind of report has trailed off in the last few years.   I have not seen this kind of speed-comparison for some time --- but I think it's definitely beneficial.

I'll clean up a bit as a Macro and comment.


> NumPy still has quite a bit that can be optimized.   I think your example is really great.    Perhaps it's worth making a C-API macro out of the short-cut to the attribute string so it can be used by others.    It would be interesting to see where your other slow-downs are.     I would be interested to see if the slow-math of float64 is hurting you.    It would be possible, for example, to do a simple subclass of the ndarray that overloads a[<integer>] to be the same as array.item(<integer>).  The latter syntax returns python objects (i.e. floats) instead of array scalars.
>
> Also, it would not be too difficult to add fast-math paths for int64, float32, and float64 scalars (so they don't go through ufuncs but do scalar-math like the float and int objects in Python.

Thanks. I'll dig a bit more into the code.


>
> A related thing we've been working on lately which might help you is Numba which might help speed up functions that have code like:  "a[0] < 4" :  http://numba.pydata.org.
>
> Numba will translate the expression a[0] < 4 to a machine-code address-lookup and math operation which is *much* faster when a is a NumPy array.    Presently this requires you to wrap your function call in a decorator:
>
> from numba import autojit
>
> @autojit
> def function_to_speed_up(...):
> 	pass
>
> In the near future (2-4 weeks), numba will grow the experimental ability to basically replace all your function calls with @autojit versions in a Python function.    I would love to see something like this work:
>
> python -m numba filename.py
>
> To get an effective autojit on all the filename.py functions (and optionally on all python modules it imports).    The autojit works out of the box today --- you can get Numba from PyPI (or inside of the completely free Anaconda CE) to try it out.

This looks very interesting. Will check it out.


> Best,
>
> -Travis
>
>
>
>
> On Dec 2, 2012, at 7:28 PM, Raul Cota wrote:
>
>> Hello,
>>
>> First a quick summary of my problem and at the end I include the basic
>> changes I am suggesting to the source (they may benefit others)
>>
>> I am ages behind in times and I am still using Numeric in Python 2.2.3.
>> The main reason why it has taken so long to upgrade is because NumPy
>> kills performance on several of my tests.
>>
>> I am sorry if this topic has been discussed before. I tried parsing the
>> mailing list and also google and all I found were comments related to
>> the fact that such is life when you use NumPy for small arrays.
>>
>> In my case I have several thousands of lines of code where data
>> structures rely heavily on Numeric arrays but it is unpredictable if the
>> problem at hand will result in large or small arrays. Furthermore, once
>> the vectorized operations complete, the values could be assigned into
>> scalars and just do simple math or loops. I am fairly sure the core of
>> my problems is that the 'float64' objects start propagating all over the
>> program data structures (not in arrays) and they are considerably slower
>> for just about everything when compared to the native python float.
>>
>> Conclusion, it is not practical for me to do a massive re-structuring of
>> code to improve speed on simple things like "a[0] < 4" (assuming "a" is
>> an array) which is about 10 times slower than "b < 4" (assuming "b" is a
>> float)
>>
>>
>> I finally decided to track down the problem and I started by getting
>> Python 2.6 from source and profiling it in one of my cases. By far the
>> biggest bottleneck came out to be PyString_FromFormatV which is a
>> function to assemble a string for a Python error caused by a failure to
>> find an attribute when "multiarray" calls PyObject_GetAttrString. This
>> function seems to get called way too often from NumPy. The real
>> bottleneck of trying to find the attribute when it does not exist is not
>> that it fails to find it, but that it builds a string to set a Python
>> error. In other words, something as simple as "a[0] < 3.5" internally
>> result in a call to set a python error .
>>
>> I downloaded NumPy code (for Python 2.6) and tracked down all the calls
>> like this,
>>
>>   ret = PyObject_GetAttrString(obj, "__array_priority__");
>>
>> and changed to
>>      if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
>>          PyTuple_CheckExact(obj) ||
>>          PyFloat_CheckExact(obj) ||
>>          PyInt_CheckExact(obj) ||
>>          PyString_CheckExact(obj) ||
>>          PyUnicode_CheckExact(obj)){
>>          //Avoid expensive calls when I am sure the attribute
>>          //does not exist
>>          ret = NULL;
>>      }
>>      else{
>>          ret = PyObject_GetAttrString(obj, "__array_priority__");
>>
>>
>>
>> ( I think I found about 7 spots )
>>
>>
>> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer
>> also resulted in Python errors being set thus unnecessarily slower code.
>>
>>
>> With this change, something like this,
>>      for i in xrange(1000000):
>>          if a[1] < 35.0:
>>              pass
>>
>> went down from 0.8 seconds to 0.38 seconds.
>>
>> A bogus test like this,
>> for i in xrange(1000000):
>>          a = array([1., 2., 3.])
>>
>> went down from 8.5 seconds to 2.5 seconds.
>>
>>
>>
>> Altogether, these simple changes got me half way to the speed I used to
>> get in Numeric and I could not see any slow down in any of my cases that
>> benefit from heavy array manipulation. I am out of ideas on how to
>> improve further though.
>>
>> Few questions:
>> - Is there any interest for me to provide the exact details of the code
>> I changed ?
>>
>> - I managed to compile NumPy through setup.py but I am not sure how to
>> force it to generate pdb files from my Visual Studio Compiler. I need
>> the pdb files such that I can run my profiler on NumPy. Anybody has any
>> experience with this ? (Visual Studio)
>>
>> - The core of my problems I think boil down to things like this
>> s = a[0]
>> assigning a float64 into s as opposed to a native float ?
>> Is there any way to hack code to change it to extract a native float
>> instead ? (probably crazy talk, but I thought I'd ask :) ).
>> I'd prefer to not use s = a.item(0) because I would have to change too
>> much code and it is not even that much faster. For example,
>>      for i in xrange(1000000):
>>          if a.item(1) < 35.0:
>>              pass
>> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)
>>
>>
>> I apologize again if this topic has already been discussed.
>>
>>
>> Regards,
>>
>> Raul
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list