[SciPy-dev] matlab io - request for testing

Nathaniel Smith njs@pobox....
Sat Feb 21 00:28:11 CST 2009


On Thu, Feb 19, 2009 at 7:42 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
> I have been beating up the matlab io rather severely in order to
> implement some cleanups, fixes, and add new options.
>
> I would very much appreciate it if people could pick up the current
> SVN and let me know whether they have any problems.

I finally got a chance to test with my nasty file, and with r5561, it
now takes ~32 minutes of cpu time to load (as compared to ~5 minutes
for 0.7.0, and 3 seconds for 0.6.0). All the time is in
zlibstreams.py:read.

I talked to the guy whose data it is now, though, and he okayed my
distributing an example:
  http://roberts.vorpus.org/~njs/tmp/test.mat
  http://roberts.vorpus.org/~njs/tmp/test-mat.txt
  http://roberts.vorpus.org/~njs/tmp/test-mat.profile
(Sorry the file is so large, all my attempts to minimize it somehow
also fixed whatever is making it so pathological.)

Does that help track things down? (This is also a good example file
for why struct_as_record=True can be Very Very Useless, and if you
combine struct_as_record=True with squeeze_me=True, the file ends up
as gibberish -- a big tuple of anonymous variables, not so useful...)

I'm also wondering, though, if (as you mentioned downthread somewhere)
the matlab IO code ends up doing a single short read and then reads
the whole actual matrix data in one fell swoop, then what benefit does
this streaming code give us? I though that the point was that one
could read small chunks and avoid taking the memory for a large
temporary buffer, but if that's not happening, then it seems like a
very slow and fragile chunk of code for no benefit.

-- Nathaniel


More information about the Scipy-dev mailing list