[SciPy-user] What is missing in scipy to be a top notch environment for signal processing (lpc and co) ?
David Cournapeau
david at ar.media.kyoto-u.ac.jp
Sun Nov 19 03:43:43 CST 2006
Hi there,
I was wondering how many people here are using numpy/scipy for
signal processing, and what are their impression compared to eg matlab ?
The point is not to do a matlab vs scipy, but more to spot weak points
in scipy, and to change the situation; neither do I want to criticize
anything. The whole point is really to improve scipy.
I've just finished my second big conversion of matlab -> python code
(~5000 lines of matlab code), and I think there are some "holes" in
scipy, which would be really useful to fill in. I believe they are
general enough so I am not the only one missing them. Here are some
functions I missed
1: linear prediction coefficients computation (the matlab lpc function).
2: more flexible autocorrelation method (ala xcorr in matlab).
3: good resampling function in time domain
4: functions capable of running the same algorithm on some strides
of an array.
More detailed:
1 requires a method to invert a Toeplitz matrix (durbin levinson)
and a method for autocorrelation.
2 right now, I believe there is only time domain autocorrelation
implementation, which is expensive for any size exceeding a few
tens/hundreds samples in numpy. Also, it is not possible to select the
lag required. In LPC coding for speech, we often need only a few lags of
signals around a few hundreds samples; just computing what is needed
would already give a 10 times fold speed increase at least for this kind
of problems. For problems where the size of the signal and the number of
coefficients is the same scale, a fft based autocorrelation would also
be beneficial.
3 basically, we want an equivalent to upfirdn, which is using a
polyphase implementation according to matlab doc + good filter design
methods (which already exist in scipy AFAIK).
4 I am not sure about this one. A concrete example: if I want to
compute the autocorrelation of some frames of a signal, the obvious
thing is to use a loop. This is expensive. If I had a matrix which each
column is a frame, and an autocorrelation function capable of running an
algorithm on each column, this would be much faster. Incidentally,
Matlab offers a function buffer, which builds a matrix which each column
is a frame, with options for overlapping and border cases. The doc says
"Y = BUFFER(X,N) partitions signal vector X into nonoverlapping data
segments (frames) of length N. Each data frame occupies one column in
the output matrix, resulting in a matrix with N rows." I don't know how,
if at all possible, to generalize that to the numpy array.
Now, because I don't want to look like a whiner, I have some code to
solve partially or totally some of the points:
- I have C code to compute Levinson Durbin, checked against matlab.
It expects to have the autocorrelation as an contiguous array; as it is
mono dimensional, adapt it to multiple stride would be trivial
- I have C code to compute only one side, and a few lags of
autocorrelation in time domain. This also expects contiguous array, and
would be a bit more tricky to adapt.
- I have code based on fftw to compute the autocorrelation using fft
(can be adapted for cross correlation too). As I think this would be
useful in scipy, I understand that it cannot use fftw. Which fft should
I use in scipy C code ?
- I would be interested in solving 3, and eventually 4, but I would
need some advices from others, as I am not sure how to solve them API-wise.
I don't know if this is of any interest for other, but I believe
some of those functionalities to be a basic requirement for scipy to be
used by people in signal processing.
Cheers,
David
More information about the SciPy-user
mailing list