[Numpy-discussion] Speeding up Numeric
jmiller at stsci.edu
Mon Jan 24 13:13:13 CST 2005
On Mon, 2005-01-24 at 20:47 +0200, Rory Yorke wrote:
> Todd Miller <jmiller at stsci.edu> writes:
> > This looked fantastic so I tried it over the weekend. On Fedora Core 3,
> > I couldn't get any information about numarray runtime (in the shared
> > libraries), only Python. Ditto with Numeric, although from your post
> > you apparently got great results including information on Numeric .so's.
> > I'm curious: has anyone else tried this for numarray (or Numeric) on
> > Fedora Core 3? Does anyone have a working profile script?
> I think you need to have --separate=lib when invoking opcontrol. (See
> later for an example.)
Thanks! That and using a more liberal "opreport -t" setting got it.
> - I think opstack is part of oprofile 0.8 (or maybe 0.8.1) -- it
> wasn't in the 0.7.1 package available for Ubuntu. Also, to actually
> get callgraphs (from opstack), you need a patched kernel; see here:
Ugh. Well, that won't happen today for me either.
> - I think you probably *shouldn't* compile with -pg if you use
> oprofile, but you should use -g.
> To profile shared libraries, I also tried the following:
> - sprof. Some sort of dark art glibc tool. I couldn't get this to work
> with dlopen()'ed libraries (in which class I believe Python C
> extensions fall).
> - qprof (http://www.hpl.hp.com/research/linux/qprof/). Almost worked,
> but I couldn't get it to identify symbols in shared libraries. Their
> page has a list of other profilers.
I tried gprof too but couldn't get much out of it. As David noted,
gprof is a pain to use with disutils too.
> I also tried the Python 2.4 profile module; it does support
> C-extension functions as advertised, but it seemed to miss object
> instantiation calls (_numarray._numarray's instantiation, in this
I think the thing to focus on is building an object cache for "almost-
new" small NumArrays; that could potentially short circuit memory
object allocation/deallocation costs, NumArray object hierarchical
allocation/deallocation costs, etc.
> rory at foo:~/hack/numarray/profile $ opreport -t 2 -l $(which python2.4)
> CPU: Athlon, speed 1836.45 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
> samples % image name symbol name
> 47122 11.2430 _ufuncFloat64.so add_ddxd_vvxv
> 26731 6.3778 python2.4 PyEval_EvalFrame
> 24122 5.7553 libc-2.3.2.so memset
> 21228 5.0648 python2.4 lookdict_string
> 10583 2.5250 python2.4 PyObject_GenericGetAttr
> 9131 2.1786 libc-2.3.2.so mcount
> 9026 2.1535 python2.4 PyDict_GetItem
> 8968 2.1397 python2.4 PyType_IsSubtype
> (The idea wasn't really to discuss the results, but anyway: The
> prominence of memset is a little odd -- are destination arrays zeroed
> before being assigned the sum result?)
Yes, the API routines which allocate the output array zero it. I've
tried to remove this in the past but at least one of the add-on packages
(linear_algebra or fft I think) wasn't stable w/o the zeroing.
> Have fun,
Better already. Thanks again!
More information about the Numpy-discussion