[SciPy-user] Frequency content of a transient signal

Nils Wagner nwagner@iam.uni-stuttgart...
Tue Jul 22 14:26:54 CDT 2008

On Tue, 22 Jul 2008 11:55:44 -0400
  "Anne Archibald" <peridot.faceted@gmail.com> wrote:
> 2008/7/22 Nils Wagner <nwagner@iam.uni-stuttgart.de>:
>> I made some progress (See test_stft.py).
>> But, how do I choose proper values for the parameter
>> NFFT,noverlap ?
> Suppose you have a time series with 4096 points and 
>duration 1 s. If
> you take a real FFT, you will get a complex array of 
>length about
> 2048. In each bin of this array is a complex number 
>representing the
> amplitude and phase of a sinusoid at a particular 
>frequency. The
> frequencies are evenly spaced from zero (a constant, 
>necessarily real)
> up to (4096/2)/(1 s), that is, 2048 Hz. The spacing is 
>linear, so bin
> 37 is 37 Hz (in this example). If you don't care about 
>the phase and
> just want to find the power, you can take the (squared) 
>amplitude of
> the complex number in each bin.
> This is useful, but it interprets *everything* as a 
>sinusoid. If you
> have an array containing ten seconds of noise, one 
>second of a tone,
> and ten more seconds of noise, doing the above will 
>produce a spectrum
> in which sinusoids are very carefully chosen to cancel 
> except during that one second. Correct, and occasionally 
>useful. but
> hard to interpret. Instead, what you often want is 
>something like the
> human ear: changes on a short timescale are interpreted 
>in the Fourier
> domain, as tones, but changes on a longer scale remain 
>in the time
> domain (tones turning on and off). This is a slightly 
>messy sort of
> problem; be aware that you are introducing a timescale 
>of transition,
> and things get very confusing near that timescale. (For 
>the human ear,
> this frequency is somewhere between a few Hertz and 
>maybe thirty
> Hertz.) Also, wavelet transforms are designed to deal 
>with this sort
> of problem in a systematic way, but their results are 
>more complicated
> to interpret still.
> So let's use a Fourier transform to compute a "dynamical 
> spectrum". For illustrative purposes, let's suppose we 
>have a data
> stream at 48000 samples per second and we want to 
>roughly mimic the
> human ear. So we want a lowest frequency of about 10 Hz; 
>let's take
> chunks of 4096 samples, then. Now a first approach would 
>simply chop
> the time sttream into 4096-sample chunks, FFT each one, 
>and take the
> (squared) amplitude. This gives you a frequency spectrum 
>with a
> spectral resolution of (48 kHz/2)/4096, roughly five 
>Hertz, and a time
> resolution of a tenth of a second. If you want more time 
> take shorter FFTs; if you want more frequency resolution 
>take longer
>FFTs. You can't have both (even changing the sample rate 
>just changes
> your highest frequency, not the resolution). This is a 
> mathematical limitation, the same one that produces the 
> principle in quantum mechanics. But subject to that 
>limitation, you've
> got a dynamic power spectrum.
> What about that overlap business? Well, suppose you have 
>a tone that
> turns on for a tenth of a second. Ideally, you'd see it 
>in one FFT but
> not in any of the others, and it'd be nice and strong. 
>But what if its
> start time falls in the middle of a chunk? Then you'll 
>see it half as
> strong in two adjacent chunks - much less visible. So 
>what people
> often do is make successive chunks overlap. Let's say 
>you do that by a
> factor of two. Then your tone, instead of falling in the 
>middle of a
> chunk, appears half as strong in one chunk, at full 
>power in the next,
> and half as strong in the third. Of course misalignments 
>are still
> possible, but the worst possible misalignment now puts 
> of the tone in each block. The less power you want to 
>risk losing, the
> more overlap you should use (and, of course, the more 
>time and space
> it takes). Also, you should note that you won't get any 
>more time
> resolution this way - it's like (in fact it is) 
>interpolating the
> power spectrum in the time dimension. You get more 
>measurements, but
> the narrowest possible peak is still about a tenth of a 
>second wide.
> A similar phenomenon occurs in the frequency domain 
>(often called
> "scalloping"). If you take such an FFT, you get the 
> needed to express your signal in terms of sinusoids at 
>frequencies (48
> kHz/2)/4096 * i. What happens if the true frequency of 
>your tone falls
> halfway between two of these frequencies? Well, it 
>appears as two
> sinusoids, one on either side at about half the power 
>(actually there
> are more further out, but they are even weaker). If 
>you're looking to
> detect tones, this means that you're more sensitive to 
> frequencies than others. You can address this with a 
> "interpolation" procedure. In this case you'd take your 
>4096 samples
> and copy them into an array of size, say, 8192; you'd 
>fill the rest of
> the array with the mean value of your samples (or zero, 
>but this
> slightly reduces the junk at the low frequencies). Now 
>when you do
> your FFT, you get twice as many coefficients; the new 
>ones interpolate
> the old ones, reducing the "scalloping". As before, the 
> interpolation you do, the less you lose in sensitivity, 
>but the more
> it costs in space and time.
> How should you choose the FFT size and the overlap? 
>Well, to pick the
>FFT size, decide on the time resolution you want to have, 
>keeping in
> mind that the reciprocal of this is (roguhly) the 
>frequency resolution
> you will get. To pick the overlap, you need to think 
>about space and
> time costs, but overlapping by a factor of two is a good 
>habit, or
> four if it's not too slow. Whether you want to do 
>interpolation in the
> frequency domain is up to you, but it's fairly cheap. 
>I'd go with a
> factor of two.
> Good luck,
> Anne
Hi Anne,

Thank you very much for your reply, which is very rich in 
content. It will take some time to "stomach" it though ;)

Are you aware of a good reference to delve into this topic 

Best wishes


More information about the SciPy-user mailing list