[Numpy-discussion] Speeding up Numeric

Tim Hochberg tim.hochberg at cox.net
Sat Jan 22 11:17:10 CST 2005


Fernando Perez wrote:

> konrad.hinsen at laposte.net wrote:
>
>> On 21.01.2005, at 22:40, Paul F. Dubois wrote:
>>
>>
>>> As I mentioned in a recent post, the original Numeric philosphy was 
>>> damn the torpedos full steam ahead; performance first, safety 
>>> second. There was a deliberate decision not to handle NaN, inf, or 
>>> anything like it, and if you overflowed you should die.
>>
>>
>>
>> There was also at some time the idea of having a "safe" version of 
>> the code (added checks as a compile-time option) and an installer 
>> that compiled both with different module names such that one could 
>> ultimately choose at run time which one to use. I really liked that 
>> idea, but it never got implemented (there was a "safe" version of 
>> ufunc in some versions but it was no different from the standard one).
>
>
> I really like this approach.  The Blitz++ library offers something 
> similar: if you build your code with -DBZ_DEBUG, it activates a ton of 
> safety checks which are normally off.  The performance plummets, but 
> it can save you days of debugging, since most pointer/memory errors 
> are flagged instantly where they occur, instead of causing the usual 
> inscrutable segfaults.
>
> F2PY also has the debug_capi flag which provides similar services, and 
> I've found it to be tremendously useful on a few occasions.
>
> It would be great to be able to simply use:
>
> #import Numeric
> import Numeric_safe as Numeric
>
> to have a safe, debug-enabled version active.  The availability of 
> such a version would also free the developers from having to cater too 
> much to safety considerations in the default version.  The default 
> could be advertised as 'fast car, no brakes, learn to jump out before 
> going off a cliff', with the _debug 'family minivan' being there if 
> safety were needed.


Before embarking on such a project, I'd urge that some careful profiling 
be done. My gut feeling is that, for most functions, no signifigant 
speedup would result from omitting the range checks that prevent 
segfaults. In the cases where removal of such checks would help in C 
(item access, very small arrays, etc) their execution time will be 
dwarfed by Python's overhead. Without care, one runs the risk of ending 
up with a minivan with no brakes; something no one needs.

'take' is a likely exception since it involves range checking at every 
element. But if only a few functions get changed, two versions of the 
library is a bad idea; two versions of the functions in question would 
be better. Particularly since, in my experience, speed is simply not 
critical for most of my numeric code, for the 5% or so where speed is 
critical I could use the unsafe functions and be more careful. This 
would be easier if the few differing functions were part of the main 
library.

I don't have a good feel for the NaN/Inf checking. If it's possible to 
hoist the checking to outside all of the loops, then the above arguments 
probably apply here as well. If not, this might be a candidate for an 
'unsafe' library. That seems more reasonable to me as I'm much more 
tolerant of NaNs than segfaults.

That's my two cents anyway.

-tim





More information about the Numpy-discussion mailing list