[SciPy-user] Frequency content of a transient signal

Nils Wagner nwagner@iam.uni-stuttgart...
Tue Jul 22 14:26:54 CDT 2008

```On Tue, 22 Jul 2008 11:55:44 -0400
"Anne Archibald" <peridot.faceted@gmail.com> wrote:
> 2008/7/22 Nils Wagner <nwagner@iam.uni-stuttgart.de>:
>
>> I made some progress (See test_stft.py).
>> But, how do I choose proper values for the parameter
>> NFFT,noverlap ?
>
> Suppose you have a time series with 4096 points and
>duration 1 s. If
> you take a real FFT, you will get a complex array of
> 2048. In each bin of this array is a complex number
>representing the
> amplitude and phase of a sinusoid at a particular
>frequency. The
> frequencies are evenly spaced from zero (a constant,
>necessarily real)
> up to (4096/2)/(1 s), that is, 2048 Hz. The spacing is
>linear, so bin
> 37 is 37 Hz (in this example). If you don't care about
>the phase and
> just want to find the power, you can take the (squared)
>amplitude of
> the complex number in each bin.
>
> This is useful, but it interprets *everything* as a
>sinusoid. If you
> have an array containing ten seconds of noise, one
>second of a tone,
> and ten more seconds of noise, doing the above will
>produce a spectrum
> in which sinusoids are very carefully chosen to cancel
>everywhere
> except during that one second. Correct, and occasionally
>useful. but
> hard to interpret. Instead, what you often want is
>something like the
> human ear: changes on a short timescale are interpreted
>in the Fourier
> domain, as tones, but changes on a longer scale remain
>in the time
> domain (tones turning on and off). This is a slightly
>messy sort of
> problem; be aware that you are introducing a timescale
>of transition,
> and things get very confusing near that timescale. (For
>the human ear,
> this frequency is somewhere between a few Hertz and
>maybe thirty
> Hertz.) Also, wavelet transforms are designed to deal
>with this sort
> of problem in a systematic way, but their results are
>more complicated
> to interpret still.
>
> So let's use a Fourier transform to compute a "dynamical
>power
> spectrum". For illustrative purposes, let's suppose we
>have a data
> stream at 48000 samples per second and we want to
>roughly mimic the
> human ear. So we want a lowest frequency of about 10 Hz;
>let's take
> chunks of 4096 samples, then. Now a first approach would
>simply chop
> the time sttream into 4096-sample chunks, FFT each one,
>and take the
> (squared) amplitude. This gives you a frequency spectrum
>with a
> spectral resolution of (48 kHz/2)/4096, roughly five
>Hertz, and a time
> resolution of a tenth of a second. If you want more time
>resolution
> take shorter FFTs; if you want more frequency resolution
>take longer
>FFTs. You can't have both (even changing the sample rate
>just changes
> your highest frequency, not the resolution). This is a
>fundamental
> mathematical limitation, the same one that produces the
>uncertainty
> principle in quantum mechanics. But subject to that
>limitation, you've
> got a dynamic power spectrum.
>
>a tone that
> turns on for a tenth of a second. Ideally, you'd see it
>in one FFT but
> not in any of the others, and it'd be nice and strong.
>But what if its
> start time falls in the middle of a chunk? Then you'll
>see it half as
> strong in two adjacent chunks - much less visible. So
>what people
> often do is make successive chunks overlap. Let's say
>you do that by a
> factor of two. Then your tone, instead of falling in the
>middle of a
> chunk, appears half as strong in one chunk, at full
>power in the next,
> and half as strong in the third. Of course misalignments
>are still
> possible, but the worst possible misalignment now puts
>three-quarters
> of the tone in each block. The less power you want to
>risk losing, the
> more overlap you should use (and, of course, the more
>time and space
> it takes). Also, you should note that you won't get any
>more time
> resolution this way - it's like (in fact it is)
>interpolating the
> power spectrum in the time dimension. You get more
>measurements, but
> the narrowest possible peak is still about a tenth of a
>second wide.
>
> A similar phenomenon occurs in the frequency domain
>(often called
> "scalloping"). If you take such an FFT, you get the
>coefficients
> needed to express your signal in terms of sinusoids at
>frequencies (48
> kHz/2)/4096 * i. What happens if the true frequency of
> halfway between two of these frequencies? Well, it
>appears as two
> sinusoids, one on either side at about half the power
>(actually there
> are more further out, but they are even weaker). If
>you're looking to
> detect tones, this means that you're more sensitive to
>some
> frequencies than others. You can address this with a
>similar
> "interpolation" procedure. In this case you'd take your
>4096 samples
> and copy them into an array of size, say, 8192; you'd
>fill the rest of
> the array with the mean value of your samples (or zero,
>but this
> slightly reduces the junk at the low frequencies). Now
>when you do
> your FFT, you get twice as many coefficients; the new
>ones interpolate
> the old ones, reducing the "scalloping". As before, the
>more
> interpolation you do, the less you lose in sensitivity,
>but the more
> it costs in space and time.
>
> How should you choose the FFT size and the overlap?
>Well, to pick the
>FFT size, decide on the time resolution you want to have,
>keeping in
> mind that the reciprocal of this is (roguhly) the
>frequency resolution
> you will get. To pick the overlap, you need to think
> time costs, but overlapping by a factor of two is a good
>habit, or
> four if it's not too slow. Whether you want to do
>interpolation in the
> frequency domain is up to you, but it's fairly cheap.
>I'd go with a
> factor of two.
>
> Good luck,
> Anne

Hi Anne,

Thank you very much for your reply, which is very rich in
content. It will take some time to "stomach" it though ;)

Are you aware of a good reference to delve into this topic
?

Best wishes

Nils
```