[Numpy-discussion] Speeding up Numeric
tim.hochberg at cox.net
Sat Jan 22 11:17:10 CST 2005
Fernando Perez wrote:
> konrad.hinsen at laposte.net wrote:
>> On 21.01.2005, at 22:40, Paul F. Dubois wrote:
>>> As I mentioned in a recent post, the original Numeric philosphy was
>>> damn the torpedos full steam ahead; performance first, safety
>>> second. There was a deliberate decision not to handle NaN, inf, or
>>> anything like it, and if you overflowed you should die.
>> There was also at some time the idea of having a "safe" version of
>> the code (added checks as a compile-time option) and an installer
>> that compiled both with different module names such that one could
>> ultimately choose at run time which one to use. I really liked that
>> idea, but it never got implemented (there was a "safe" version of
>> ufunc in some versions but it was no different from the standard one).
> I really like this approach. The Blitz++ library offers something
> similar: if you build your code with -DBZ_DEBUG, it activates a ton of
> safety checks which are normally off. The performance plummets, but
> it can save you days of debugging, since most pointer/memory errors
> are flagged instantly where they occur, instead of causing the usual
> inscrutable segfaults.
> F2PY also has the debug_capi flag which provides similar services, and
> I've found it to be tremendously useful on a few occasions.
> It would be great to be able to simply use:
> #import Numeric
> import Numeric_safe as Numeric
> to have a safe, debug-enabled version active. The availability of
> such a version would also free the developers from having to cater too
> much to safety considerations in the default version. The default
> could be advertised as 'fast car, no brakes, learn to jump out before
> going off a cliff', with the _debug 'family minivan' being there if
> safety were needed.
Before embarking on such a project, I'd urge that some careful profiling
be done. My gut feeling is that, for most functions, no signifigant
speedup would result from omitting the range checks that prevent
segfaults. In the cases where removal of such checks would help in C
(item access, very small arrays, etc) their execution time will be
dwarfed by Python's overhead. Without care, one runs the risk of ending
up with a minivan with no brakes; something no one needs.
'take' is a likely exception since it involves range checking at every
element. But if only a few functions get changed, two versions of the
library is a bad idea; two versions of the functions in question would
be better. Particularly since, in my experience, speed is simply not
critical for most of my numeric code, for the 5% or so where speed is
critical I could use the unsafe functions and be more careful. This
would be easier if the few differing functions were part of the main
I don't have a good feel for the NaN/Inf checking. If it's possible to
hoist the checking to outside all of the loops, then the above arguments
probably apply here as well. If not, this might be a candidate for an
'unsafe' library. That seems more reasonable to me as I'm much more
tolerant of NaNs than segfaults.
That's my two cents anyway.
More information about the Numpy-discussion