[Numpy-discussion] Numeric to numarray experiences

Raik Grünberg graik at web.de
Tue Oct 5 10:44:13 CDT 2004


Hi there,

I've just translated a package for molecular modelling, which makes extensive 
use of Numeric, from Numeric to numarray. The outcome is somewhat negative - 
for now we are basically going to postpone the transition - the reasons might 
be interesting for the list and the numarray developpers out there (who are 
doing a brave job!).

Speed:
A typical task in our package is the least-square fitting of a large array of 
coordinate frames ( N1 x N2 x 3) onto a set of reference or average 
coordinates (using a sub-set of coordinates for the matching). The example I 
looked at (500 x 876 x 3 items) took 1.3 s with Numeric and 4.7 s with 
numarray. The main culprits for the slow-down were:
* compress() - factor 10
* average() - factor 7 (average() is missing from Numeric and I hence had to 
write a little function myself)
* LinearAlgebra.singular_value_decomposition() - factor 10
but a lot of extra time is also spent in uufunc.py and various numarraycore.py 
routines.

Memory efficiency:
I hoped numarray would solve some of the Out-of-memory problems that I get 
with Numeric but it turns out that it is rather less memory efficient for my 
kind of applications. Slicing an array that takes up 800MB on disc just about 
runs through with Numeric (and heavy swapping) but gives an Out-of-memory 
with numarray.

Suggestions:
OK, it's easy to make clever comments without contributing any real work...
- compress(), take(), etc, really need some optimization
- a C-coded average() routine would be helpful
- faster LinearAlgebra routines are necessary

Our sysadmin noted that unlike Numeric, numarray is not using any external 
math libraries (like LAPACK) that have been speed-optimized for decades and 
are available in CPU-optimized variants (e.g. ATLAS). It's probably difficult 
to match this efficiency with any new code ...

Greetings
Raik

PS:
I didn't find any useful HowTo for the translation from Numeric to numarray. 
The practical issues were the different nonzero() return value, the more 
restrictive boolean comparison, that take doesn't support 'O' arrays any 
longer, and the missing average().

-- 
-----------------------------------------------------
Raik Grünberg		| Bioinformatique Structurale
				| Institut Pasteur
				| Paris, France
-----------------------------------------------------




More information about the Numpy-discussion mailing list