[Scipy-tickets] [SciPy] #1585: scipy.io.wavfile.read warns on Audacity wav files with metadata

SciPy Trac scipy-tickets@scipy....
Fri Jan 13 21:50:30 CST 2012


#1585: scipy.io.wavfile.read warns on Audacity wav files with metadata
----------------------+-----------------------------------------------------
 Reporter:  richli    |       Owner:  somebody   
     Type:  defect    |      Status:  new        
 Priority:  normal    |   Milestone:  Unscheduled
Component:  scipy.io  |     Version:  0.10.0     
 Keywords:            |  
----------------------+-----------------------------------------------------
 I recorded some sound with Audacity and then exported it using the default
 options to a 16-bit signed PCM WAV file. Upon export, it fills in some
 metadata (artist name, track, etc) automatically but allows me to
 customize or clear out the metadata.

 The exported wav file plays fine with mplayer et al., but when I load it
 using scipy.io.wavfile.read, I get this warning:


 {{{
 /usr/lib/python2.7/site-packages/scipy/io/wavfile.py:121: WavFileWarning:
 chunk not understood
   warnings.warn("chunk not understood", WavFileWarning)
 }}}

 Upon examination of the source code and a hexdump of the offending wav
 file, I determined that the code only handles "format" and "data" chunks.
 All other types of wav chunks are skipped. The chunk in question happens
 to be an "info" chunk that is used for storing metadata. It looks like
 wavfile.py correctly skips over it, but it's still annoying to see
 warnings about perfectly legitimate wav files.

 To double-check things, I re-exported the wav files from Audacity, this
 time clearing out metadata. A hex dump confirms that no "list" chunks
 exist in the wav file and wavfile.py no longer issues the warning.

 Here's a helpful reference I found that talks about the chunks found in
 wav files: [http://www.sonicspot.com/guide/wavefiles.html#list].

 I also have a quick patch:

 {{{

 [earl@leto io]$ diff wavfile_orig.py  wavfile.py
 35a36,44
 > def _skip_unknown_chunk(fid):
 >     if _big_endian:
 >         fmt = '>i'
 >     else:
 >         fmt = '<i'
 >     data = fid.read(4)
 >     size = struct.unpack(fmt, data)[0]
 >     fid.seek(size, 1)
 >
 119a129,131
 >         elif chunk_id == asbytes('LIST'):
 >             # Someday this could be handled properly but for now skip it
 >             _skip_unknown_chunk(fid)
 122,128c134
 <             data = fid.read(4)
 <             if _big_endian:
 <                 fmt = '>i'
 <             else:
 <                 fmt = '<i'
 <             size = struct.unpack(fmt, data)[0]
 <             fid.seek(size, 1)
 ---
 >             _skip_unknown_chunk(fid)

 }}}

 In this patch I move the else suite to a new private function,
 _skip_unknown_chunk(). I create an extra elif case for a "list" chunk and
 note that it could be handled in the future to return the contained
 metadata, but currently it just treats it as unknown and skips it. It's
 not important to me currently for it to read the metadata from the wav
 file, but someday it may be useful.

 Finally, for reference I am running:
 {{{
 Linux leto 3.1.8-1-ARCH #1 SMP PREEMPT Sat Jan 7 08:59:43 CET 2012 x86_64
 AMD Sempron(tm) 140 Processor AuthenticAMD GNU/Linux
 }}}

 And this is scipy v0.10.0 on Arch linux.

 Thanks,
 Rich

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1585>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list