[Numpy-discussion] Speeding up Numeric

Todd Miller jmiller at stsci.edu
Mon Jan 24 13:13:13 CST 2005

On Mon, 2005-01-24 at 20:47 +0200, Rory Yorke wrote:
> Todd Miller <jmiller at stsci.edu> writes:
> > This looked fantastic so I tried it over the weekend.  On Fedora Core 3,
> > I couldn't get any information about numarray runtime (in the shared
> > libraries),  only Python.  Ditto with Numeric,  although from your post
> > you apparently got great results including information on Numeric .so's.
> > I'm curious: has anyone else tried this for numarray (or Numeric) on
> > Fedora Core 3?  Does anyone have a working profile script?
> I think you need to have --separate=lib when invoking opcontrol. (See
> later for an example.)

Thanks!  That and using a more liberal "opreport -t" setting got it.

> - I think opstack is part of oprofile 0.8 (or maybe 0.8.1) -- it
>   wasn't in the 0.7.1 package available for Ubuntu. Also, to actually
>   get callgraphs (from opstack), you need a patched kernel; see here:
>      http://oprofile.sf.net/patches/

Ugh.  Well, that won't happen today for me either.

> - I think you probably *shouldn't* compile with -pg if you use
>   oprofile, but you should use -g.
> To profile shared libraries, I also tried the following:
> - sprof. Some sort of dark art glibc tool. I couldn't get this to work
>   with dlopen()'ed libraries (in which class I believe Python C
>   extensions fall).
> - qprof (http://www.hpl.hp.com/research/linux/qprof/). Almost worked,
>   but I couldn't get it to identify symbols in shared libraries. Their
>   page has a list of other profilers.

I tried gprof too but couldn't get much out of it.  As David noted,
gprof is a pain to use with disutils too.

> I also tried the Python 2.4 profile module; it does support
> C-extension functions as advertised, but it seemed to miss object
> instantiation calls (_numarray._numarray's instantiation, in this
> case).

I think the thing to focus on is building an object cache for "almost-
new" small NumArrays;  that could potentially short circuit memory
object allocation/deallocation costs, NumArray object hierarchical
allocation/deallocation costs, etc.  

> rory at foo:~/hack/numarray/profile $ opreport -t 2 -l $(which python2.4)
> CPU: Athlon, speed 1836.45 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
> samples  %        image name               symbol name
> 47122    11.2430  _ufuncFloat64.so         add_ddxd_vvxv
> 26731     6.3778  python2.4                PyEval_EvalFrame
> 24122     5.7553  libc-2.3.2.so            memset
> 21228     5.0648  python2.4                lookdict_string
> 10583     2.5250  python2.4                PyObject_GenericGetAttr
> 9131      2.1786  libc-2.3.2.so            mcount
> 9026      2.1535  python2.4                PyDict_GetItem
> 8968      2.1397  python2.4                PyType_IsSubtype
> (The idea wasn't really to discuss the results, but anyway: The
> prominence of memset is a little odd -- are destination arrays zeroed
> before being assigned the sum result?)

Yes,  the API routines which allocate the output array zero it.  I've
tried to remove this in the past but at least one of the add-on packages
(linear_algebra or fft I think) wasn't stable w/o the zeroing.

> Have fun,

Better already.  Thanks again!


More information about the Numpy-discussion mailing list