[SciPy-dev] Fixing correlate: handling api breakage ?

David Cournapeau david@ar.media.kyoto-u.ac...
Thu May 28 06:19:35 CDT 2009

josef.pktd@gmail.com wrote:
> I read somewhere in the description for one of the correlate that the
> code is faster if the first array is longer than the second.

Yes, but that's an implementation detail - it should not affect the
result. The speed difference may come from zero padding handling. But if
copies are acceptable, correlate(a, b) and correlate(b, a) can be
obtained from 'inverted index', i.e., ignoring boundaries effects
correlate(a, b) == correlate(b, a)[::-1, ::-1] (for rank 2 arrays). This
may be much faster.

FWIW, matlab is much slower at doing xcorr2(a, b) than xcorr2(b, a) if b
has significantly smaller number of items.

> I'm new to this, but following the discussion on the cython list, I
> thought that to have fast c code you always need to copy to a
> contiguous array.

It depends on the computation. The idea is that generally, it is really
slow to do "memory jumps" on common CPU, especially when the data cannot
fit the cache anymore.  Doing operations one item after the other in
memory is much faster than doing it 'randomly' (for a real explanation,
you may consult http://people.redhat.com/drepper/cpumemory.pdf). For
things like item/item computation (cos, sin, or things which are
processed axis per axis), copy + contiguity is often the way to go,
because this way, the items you need for computation are near in term of
memory location.

Now, for correlation or convolution, it is more complex, because you
need to do the operation for a neighborhood. The whole problem is that
the neighborhood is defined in terms of the image (spacial locality, the
multi-dimensional indexes are close to each other), not in terms of
memory. For rank one array, it is the same for contiguous arrays. For 2d
arrays, it is already a bit difficult, but for rank 3 or rank 4, it is
much more complex: for any point in a 3d image, for example, the nearest
voxels can be pretty far in terms of memory. To match spatial locality
to memory locality, you would need a more complex memory model than what
numpy can offer out of the box, I think.

In my rewriting of the correlate function, I implemented things in term
of a 'neighborhood' iterator (it looks like this is nothing new - some
package like itk already had this for years). The resulting code is
simpler, because the complexity is hidden in the iterator, and speed is
approximately the same. We are quite slower than matlab, though (a
factor of 3-4 compared to imfilter).

The code is there:


> Are there performance differences  between calling it once with a huge
> array versus calling it very often (eg. for sequential "online"
> computation) for medium sized arrays.
> What is huge in your talk box applications?

In my own case, it not huge or even big. For many audio tasks, you need
to compute things on a frame-per-frame basis (for example, splitting a
big signal into overlapping segments of say 512 samples, and do 1d
autocorrelation of this). So the usual trick is to split the big signal
into a matrix where each row or column is one frame, and call a function
which processes in one axis. None of the correlation function in
numpy/scipy has this AFAIK. Some people sent me code for talkbox so that
it can handle correlation with very big arrays, but it is one or two
dimensions only.



More information about the Scipy-dev mailing list