[Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download

Krishnaswami, Neel neelk at cswcasa.com
Tue Nov 27 05:52:05 CST 2001


Perry Greenfield [mailto:perry at stsci.edu] wrote:
> > 
> > I know large datasets were one of your driving factors, but I really
> > don't want to make performance on smaller datasets secondary.
> 
> That's why we are asking, and it seems so far that there are enough
> of those that do care about small arrays to spend the effort to
> significantly improve the performance.

Well, here's my application. I do data mining work, and one of the
techniques I want to use Numpy for is to implement robust regression
algorithms like least-trimmed-squares. Now for a k-variable regression,
the best-of-breed algorithm for this involves taking hundreds of 
thousands of k-element samples and calculating the fitting hyperplane
through them.

Small matrix performance is thus something this program lives or dies 
by, and right now it seems like 'dies' is the right measure -- it is
about 10x slower than the Gauss program that does the same thing. :(

When I profiled it seems like Numpy is spending almost all of its 
time in _castCopyAndTranspose. Switching to the Intel MKL LAPACK 
had no performance effect, but changing _castCopyAndTranspose into 
a C function was a 20% speed increase. 

If Numpy2 is even slower on small matrices I'd have to give up using
it, and that's a shame: it's a *much* nicer environment than Gauss is.

--
Neel Krishnaswami
neelk at cswcasa.com




More information about the Numpy-discussion mailing list