[Numpy-discussion] Speed bottlenecks on simple tasks - suggested improvement

Raul Cota raul@virtualmaterials....
Mon Dec 3 09:33:23 CST 2012


Thanks Christoph.

It seemed to work. Will do profile runs today/tomorrow and see what come 
out.


Raul



On 02/12/2012 7:33 PM, Christoph Gohlke wrote:
> On 12/2/2012 5:28 PM, Raul Cota wrote:
>> Hello,
>>
>> First a quick summary of my problem and at the end I include the basic
>> changes I am suggesting to the source (they may benefit others)
>>
>> I am ages behind in times and I am still using Numeric in Python 2.2.3.
>> The main reason why it has taken so long to upgrade is because NumPy
>> kills performance on several of my tests.
>>
>> I am sorry if this topic has been discussed before. I tried parsing the
>> mailing list and also google and all I found were comments related to
>> the fact that such is life when you use NumPy for small arrays.
>>
>> In my case I have several thousands of lines of code where data
>> structures rely heavily on Numeric arrays but it is unpredictable if the
>> problem at hand will result in large or small arrays. Furthermore, once
>> the vectorized operations complete, the values could be assigned into
>> scalars and just do simple math or loops. I am fairly sure the core of
>> my problems is that the 'float64' objects start propagating all over the
>> program data structures (not in arrays) and they are considerably slower
>> for just about everything when compared to the native python float.
>>
>> Conclusion, it is not practical for me to do a massive re-structuring of
>> code to improve speed on simple things like "a[0] < 4" (assuming "a" is
>> an array) which is about 10 times slower than "b < 4" (assuming "b" is a
>> float)
>>
>>
>> I finally decided to track down the problem and I started by getting
>> Python 2.6 from source and profiling it in one of my cases. By far the
>> biggest bottleneck came out to be PyString_FromFormatV which is a
>> function to assemble a string for a Python error caused by a failure to
>> find an attribute when "multiarray" calls PyObject_GetAttrString. This
>> function seems to get called way too often from NumPy. The real
>> bottleneck of trying to find the attribute when it does not exist is not
>> that it fails to find it, but that it builds a string to set a Python
>> error. In other words, something as simple as "a[0] < 3.5" internally
>> result in a call to set a python error .
>>
>> I downloaded NumPy code (for Python 2.6) and tracked down all the calls
>> like this,
>>
>>     ret = PyObject_GetAttrString(obj, "__array_priority__");
>>
>> and changed to
>>        if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
>>            PyTuple_CheckExact(obj) ||
>>            PyFloat_CheckExact(obj) ||
>>            PyInt_CheckExact(obj) ||
>>            PyString_CheckExact(obj) ||
>>            PyUnicode_CheckExact(obj)){
>>            //Avoid expensive calls when I am sure the attribute
>>            //does not exist
>>            ret = NULL;
>>        }
>>        else{
>>            ret = PyObject_GetAttrString(obj, "__array_priority__");
>>
>>
>>
>> ( I think I found about 7 spots )
>>
>>
>> I also noticed (not as bad in my case) that calls to PyObject_GetBuffer
>> also resulted in Python errors being set thus unnecessarily slower code.
>>
>>
>> With this change, something like this,
>>        for i in xrange(1000000):
>>            if a[1] < 35.0:
>>                pass
>>
>> went down from 0.8 seconds to 0.38 seconds.
>>
>> A bogus test like this,
>> for i in xrange(1000000):
>>            a = array([1., 2., 3.])
>>
>> went down from 8.5 seconds to 2.5 seconds.
>>
>>
>>
>> Altogether, these simple changes got me half way to the speed I used to
>> get in Numeric and I could not see any slow down in any of my cases that
>> benefit from heavy array manipulation. I am out of ideas on how to
>> improve further though.
>>
>> Few questions:
>> - Is there any interest for me to provide the exact details of the code
>> I changed ?
>>
>> - I managed to compile NumPy through setup.py but I am not sure how to
>> force it to generate pdb files from my Visual Studio Compiler. I need
>> the pdb files such that I can run my profiler on NumPy. Anybody has any
>> experience with this ? (Visual Studio)
>
> Change the compiler and linker flags in
> Python\Lib\distutils\msvc9compiler.py to:
>
> self.compile_options = ['/nologo', '/Ox', '/MD', '/W3', '/DNDEBUG', '/Zi']
> self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:YES', '/DEBUG']
>
> Then rebuild numpy.
>
> Christoph
>
>
>
>> - The core of my problems I think boil down to things like this
>> s = a[0]
>> assigning a float64 into s as opposed to a native float ?
>> Is there any way to hack code to change it to extract a native float
>> instead ? (probably crazy talk, but I thought I'd ask :) ).
>> I'd prefer to not use s = a.item(0) because I would have to change too
>> much code and it is not even that much faster. For example,
>>        for i in xrange(1000000):
>>            if a.item(1) < 35.0:
>>                pass
>> is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)
>>
>>
>> I apologize again if this topic has already been discussed.
>>
>>
>> Regards,
>>
>> Raul
>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list