[SciPy-user] What is missing in scipy to be a top notch environment for signal processing (lpc and co) ?

David Cournapeau david at ar.media.kyoto-u.ac.jp
Sun Nov 19 03:43:43 CST 2006


Hi there,

    I was wondering how many people here are using numpy/scipy for 
signal processing, and what are their impression compared to eg matlab ? 
The point is not to do a matlab vs scipy, but more to spot weak points 
in scipy, and to change the situation; neither do I want to criticize 
anything. The whole point is really to improve scipy.

    I've just finished my second big conversion of matlab -> python code 
(~5000 lines of matlab code), and I think there are some "holes" in 
scipy, which would be really useful to fill in. I believe they are 
general enough so I am not the only one missing them. Here are some 
functions I missed

    1: linear prediction coefficients computation (the matlab lpc function).
    2: more flexible autocorrelation method (ala xcorr in matlab).
    3: good resampling function in time domain
    4: functions capable of running the same algorithm on some strides 
of an array.

More detailed:

    1 requires a method to invert a Toeplitz matrix (durbin levinson) 
and a method for autocorrelation.
    2 right now, I believe there is only time domain autocorrelation 
implementation, which is expensive for any size exceeding a few 
tens/hundreds samples in numpy. Also, it is not possible to select the 
lag required. In LPC coding for speech, we often need only a few lags of 
signals around a few hundreds samples; just computing what is needed 
would already give a 10 times fold speed increase at least for this kind 
of problems. For problems where the size of the signal and the number of 
coefficients is the same scale, a fft  based autocorrelation would also 
be beneficial.
    3 basically, we want an equivalent to upfirdn, which is using a 
polyphase implementation according to matlab doc + good filter design 
methods (which already exist in scipy AFAIK).
    4 I am not sure about this one. A concrete example: if I want to 
compute the autocorrelation of some frames of a signal, the obvious 
thing is to use a loop. This is expensive. If I had a matrix which each 
column is a frame, and an autocorrelation function capable of running an 
algorithm on each column, this would be much faster. Incidentally, 
Matlab offers a function buffer, which builds a matrix which each column 
is a frame, with options for overlapping and border cases. The doc says 
"Y = BUFFER(X,N) partitions signal vector X into nonoverlapping data 
segments (frames) of length N.  Each data frame occupies one column in 
the output matrix, resulting in a matrix with N rows." I don't know how, 
if at all possible, to generalize that to the numpy array.
 
    Now, because I don't want to look like a whiner, I have some code to 
solve partially or totally some of the points:
    - I have C code to compute Levinson Durbin, checked against matlab. 
It expects to have the autocorrelation as an contiguous array; as it is 
mono dimensional, adapt it to multiple stride would be trivial
    - I have C code to compute only one side, and a few lags of 
autocorrelation in time domain. This also expects contiguous array, and 
would be a bit more tricky to adapt.
    - I have code based on fftw to compute the autocorrelation using fft 
(can be adapted for cross correlation too). As I think this would be 
useful in scipy, I understand that it cannot use fftw. Which fft should 
I use in scipy C code ?
    - I would be interested in solving 3, and eventually 4, but I would 
need some advices from others, as I am not sure how to solve them API-wise.

    I don't know if this is of any interest for other, but I believe 
some of those functionalities to be a basic requirement for scipy to be 
used by people in signal processing.

    Cheers,

    David
 


More information about the SciPy-user mailing list