[SciPy-user] Running mean: numpy or pyrex?

A. M. Archibald peridot.faceted at gmail.com
Sun Oct 1 09:35:58 CDT 2006

On 30/09/06, David Finlayson <david.p.finlayson at gmail.com> wrote:

> I was fairly impressed with the filter performance using psyco until I met
> some truly huge files. Now I am wondering if I would get any speed benefit
> from using numpy (for the 2D array access) or would I be better off trying
> pyrex for the running mean loops? One of the issues I am facing is the
> possibility that very large files may need to be read from the disk as
> needed instead of holding the whole thing in ram. I also need to distrubute
> this code to my team, so minimizing the pain of installing numeric and other
> packages would be an issue. Should I expect significant speed benefits from
> a numy version of this type of code?

Numpy is designed for exactly this kind of computation. It will almost
certainly be simpler to express your calculation in numpy, and it may
be faster (as the internal loops are written in C). Numpy is also
capable of operating on on-disk arrays by mmap()ing them (so that
pages are loaded and unloaded as needed). Once in numpy, there are
also tools such as weave which should allow you to accelerate
calculations further (by running more code in C).

On the other hand, depending what you're doing, psyco may accelerate
your program, and it is certainly easy to add (two lines: import
psyco; psyco.full() ) though it has never actually accelerated any
computation that I tried it on.

If you don't care about linear algebra speed, I think numpy is
relatively easy to install. (For me it was just a question of clicking
on a checkbox as iti is packaged for my distribution of Linux.) If
you're doing major calculations, you may need to ensure that your
favourite fast linear algebra package gets detected.

If you do end up trying it, please report on your experiences.
Performance is not always what we expect. (For example, I tried
converting a number of examples from the Great Computer Language
Shootout to use Numeric only to discover that they ran more slowly.)

A. M. Archibald

More information about the SciPy-user mailing list