[SciPy-User] Convert a time-frequency array to sound file

Anne Archibald peridot.faceted@gmail....
Mon Feb 15 11:30:26 CST 2010

On 15 February 2010 08:35, Yannick Copin <y.copin@ipnl.in2p3.fr> wrote:
> Hi,
>> You could just take the inverse fourier transform of each spectra and then
>> just patch them end to end, but I suspect this will end up sounding pretty
>> awful as you'd get lots of phase discontinuities at the end of each
>> segment. A better strategy might be to generate a continuous sin wave for
>> each frequency (with a good number of samples per segment), multiply each
>> of these sin waves by the corresponding, interpolated, spectral amplitude,
>> and then sum over the different frequencies.
>> This should be easy enough to do with the standard numpy functions.
> Thanks for the tips. Alas, I guess I have to conclude that such a conversion
> procedure does not already exist...

Without phase information, there's no well-defined way to do this.
It's easiest to think of this the other way around: suppose you start
with a sound file. To get frequency-as-a-function-of-time information,
you need to break the file up into chunks and take an FFT of each
chunk. The size of each chunk is where you switch from a time
representation to a frequency representation - that is, suppose you go
with 20 chunks a second. Then a tone higher than 20 Hz will appear in
your spectra, while a tone lower than 20 Hz will appear as variations
from one spectrum to the next. Once you've decided on the chunk size,
you probably want to avoid discontinuities at the ends of the chunks,
so you probably want the chunks to overlap. Now having done this, you
have a series of spectra, containing phase and amplitude information
for each frequency. You can reconstruct the original signal exactly
from this.

What you have amounts to only the amplitude information for each
chunk. This is not necessarily a problem, as the ear can't really hear
phases of sounds, although you have to be careful when combining
pieces so that you don't get too much constructive or destructive
interference. Choosing random phases will probably be fine. So to get
a sound file, you could inverse FFT these spectra, and then piece them
together with overlap. The result will probably sound horrible. There
are some things you can tweak:

* The temporal spacing of the chunks. Depending on what your initial
information looks like, you might want to either average it down in
the time direction, or interpolate it up, so that the sounds happen at
a reasonable pace.

* The frequency range you generate. In a raw FFT approach the lowest
frequency in your image will be the chunk rate and the highest will be
set by the sampling rate, probably inaudibly high. You may well want
to rearrange the frequency range so it's comfortable, say from
100-1000 Hz.

* The atonality of the output. Especially if you have very little
information in the frequency direction, you may want to ensure that
every frequency that's generated corresponds to a note on some musical
scale, possibly even using some instrument more elaborate than a sine
wave. For this you should look at software (music) synthesizers.

The general family of techniques you'd be looking at for this would be
"granular synthesis"; in particular, there are utilities out there to
turn images into sound files. I don't know the name of any particular
one, but some Googling ought to turn up some choices with varying
degrees of flexibility.

The more general problem, of "graphical visualization" as applied to
sound rather than images, seems to be a sadly understudied field.


P.S. Here's an example of this sort of thing, although sadly very
little documentation exists:

More information about the SciPy-User mailing list