[SciPy-dev] SciPy-dev] [patch] read/write v5 .mat files with structs, cell arrays, objects, or function handles

Vebjorn Ljosa ljosa@broad.mit....
Thu Oct 2 09:14:14 CDT 2008


Stéfan van der Walt <stefan@sun.ac.za> writes:
>
> We should certainly look at applying the non-API-changing parts,
> though.  I'm not sure what the best way is to represent these
> structures on the Python side.
>
> Thouis, you've thought about this a lot: could you tell us the pros
> and cons of switching to the new representation?

The reason Ray and I changed some of the representations is that we
wanted the mapping from Matlab to Python to be symmetric: anything read
from a MAT-file should be represented in a way that allows the writer
code to write it back in its original form.  This requires that the
original Matlab type be deducible from the Python representation.

 * Struct arrays: Matlab struct arrays were previously represented as
   numpy arrays of dtype=object filled with instances of mat_struct.
   The problem is that Matlab cell arrays were also represented as numpy
   arrays of dtype=objects.  The writer code could in most cases have
   identified structs by looking at the contents (instances of
   mat_struct), but there was no way to distinguish a 0x0 cell array
   from a 0x0 struct array.  We therefore opted to represent struct
   arrays as numpy record arrays.

   In order not to break existing code, we could introduce a keyword
   argument to loadmat that selects the old or new representation,
   similar to numpy.histogram's "new" argument.  In 0.7, leaving the
   argument out would default to False (old behavior), but give a
   deprecation warning.  Later versions can first change the default to
   True and then remove the old behavior entirely.  The best name I can
   think of for this keyword argument is "struct_as_record".

 * Char arrays/strings: Same story.  At the lowest level, the code
   represented char arrays as numpy arrays of dtype='U1', which is
   fine.  A very useful "processor function" (in miobase) turns them
   into arrays of strings, however.  This processor function created
   an array of dtype=object.  We changed this to 'U...' so the array
   could be distinguished from a cell array.  I think this is unlikely
   to break any code, do you agree?

 * Objects: This change in representation was purely for our
   convenience, and we should be able to fix our patch to keep the old
   representation.

Vebjorn


More information about the Scipy-dev mailing list