[Numpy-discussion] Speeding up Numeric

Todd Miller jmiller at stsci.edu
Mon Jan 24 08:04:15 CST 2005

> 3) Use oprofile (http://oprofile.sourceforge.net/), which runs on
>    Linux on a x86 processor. This is the approach that I've used here.
>    oprofile is a combination of a kernel module for Linux, a daemon
>    for collecting sample data, and several tools to analyse the
>    samples. It periodically polls the processor performance counters,
>    and records which code is running. It's a system-level profiler: it
>    profiles _everything_ that's running on the system. One obstacle is
>    that does require root access.

This looked fantastic so I tried it over the weekend.  On Fedora Core 3,
I couldn't get any information about numarray runtime (in the shared
libraries),  only Python.  Ditto with Numeric,  although from your post
you apparently got great results including information on Numeric .so's.
I'm curious: has anyone else tried this for numarray (or Numeric) on
Fedora Core 3?  Does anyone have a working profile script?

> Numeric is faster 

(with the check_array() feature deletion)

> than numarray from CVS, but there seems to be regression.  

(in numarray performance)

Don't take this the wrong way,  but how confident are you that the speed
differences are real?  (With my own benchmarking numbers,  there is
always too much fuzz to split hairs like this.)

> Without check_array, Numeric is almost as fast as as
> numarray 1.1.1.
> Remarks
> -------
> - I'd rather have my speed than checks for NaN's. Have that in a
>   separate function (I'm willing to write one), or do numarray-style
>   processor flag checks (tougher).
> - General plea: *please*, *please*, when releasing a library for which
>   speed is a selling point, profile it first!
> - doing the same profiling on numarray finds 15% of the time actually
>   adding, 65% somewhere in python, and 15% in libc.

Part of this is because the numarray number protocol is still in Python.

> - I'm still fiddling. Using the three-argument form of Numeric.add (so

add(a,b) and add(a,b,c) are what I've focused on for profiling numarray
until the number protocol is moved to C.  I've held off doing that
because the numarray number protocol is complicated by subclassing
issues I'm not sure are fully resolved.


More information about the Numpy-discussion mailing list