[SciPy-dev] matlab io - request for testing

Nathaniel Smith njs@pobox....
Sat Feb 21 00:28:11 CST 2009

On Thu, Feb 19, 2009 at 7:42 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
> I have been beating up the matlab io rather severely in order to
> implement some cleanups, fixes, and add new options.
> I would very much appreciate it if people could pick up the current
> SVN and let me know whether they have any problems.

I finally got a chance to test with my nasty file, and with r5561, it
now takes ~32 minutes of cpu time to load (as compared to ~5 minutes
for 0.7.0, and 3 seconds for 0.6.0). All the time is in

I talked to the guy whose data it is now, though, and he okayed my
distributing an example:
(Sorry the file is so large, all my attempts to minimize it somehow
also fixed whatever is making it so pathological.)

Does that help track things down? (This is also a good example file
for why struct_as_record=True can be Very Very Useless, and if you
combine struct_as_record=True with squeeze_me=True, the file ends up
as gibberish -- a big tuple of anonymous variables, not so useful...)

I'm also wondering, though, if (as you mentioned downthread somewhere)
the matlab IO code ends up doing a single short read and then reads
the whole actual matrix data in one fell swoop, then what benefit does
this streaming code give us? I though that the point was that one
could read small chunks and avoid taking the memory for a large
temporary buffer, but if that's not happening, then it seems like a
very slow and fragile chunk of code for no benefit.

-- Nathaniel

More information about the Scipy-dev mailing list