[Numpy-discussion] Numeric to numarray experiences
Raik Grünberg
graik at web.de
Tue Oct 5 10:44:13 CDT 2004
Hi there,
I've just translated a package for molecular modelling, which makes extensive
use of Numeric, from Numeric to numarray. The outcome is somewhat negative -
for now we are basically going to postpone the transition - the reasons might
be interesting for the list and the numarray developpers out there (who are
doing a brave job!).
Speed:
A typical task in our package is the least-square fitting of a large array of
coordinate frames ( N1 x N2 x 3) onto a set of reference or average
coordinates (using a sub-set of coordinates for the matching). The example I
looked at (500 x 876 x 3 items) took 1.3 s with Numeric and 4.7 s with
numarray. The main culprits for the slow-down were:
* compress() - factor 10
* average() - factor 7 (average() is missing from Numeric and I hence had to
write a little function myself)
* LinearAlgebra.singular_value_decomposition() - factor 10
but a lot of extra time is also spent in uufunc.py and various numarraycore.py
routines.
Memory efficiency:
I hoped numarray would solve some of the Out-of-memory problems that I get
with Numeric but it turns out that it is rather less memory efficient for my
kind of applications. Slicing an array that takes up 800MB on disc just about
runs through with Numeric (and heavy swapping) but gives an Out-of-memory
with numarray.
Suggestions:
OK, it's easy to make clever comments without contributing any real work...
- compress(), take(), etc, really need some optimization
- a C-coded average() routine would be helpful
- faster LinearAlgebra routines are necessary
Our sysadmin noted that unlike Numeric, numarray is not using any external
math libraries (like LAPACK) that have been speed-optimized for decades and
are available in CPU-optimized variants (e.g. ATLAS). It's probably difficult
to match this efficiency with any new code ...
Greetings
Raik
PS:
I didn't find any useful HowTo for the translation from Numeric to numarray.
The practical issues were the different nonzero() return value, the more
restrictive boolean comparison, that take doesn't support 'O' arrays any
longer, and the missing average().
--
-----------------------------------------------------
Raik Grünberg | Bioinformatique Structurale
| Institut Pasteur
| Paris, France
-----------------------------------------------------
More information about the Numpy-discussion
mailing list