[SciPy-user] Very slow loadmat in scipy 0.7 (regression)

Antonino Ingargiola tritemio@gmail....
Sun Feb 22 05:48:34 CST 2009


Hi to the list,

I'm loading matlab file of a few tents of MB in python with
scipy.io.loadmat. With scipy 0.6 (the stock ubuntu 8.10 version) the
load takes a few seconds (2-5 sec). Now with scipy 0.7 it takes much
longer, around  80 secs.

I did a profile and found that the all the time is spent in
GzipInputStream.__zfill method. I blindly tried to change the
GzipInputStream.blocksize attribute from 16K to 256K and 1M and found
that the performances become exponentially better. Here there are the
profile resuts loading a 33M matlab file:

*Scipy 0.7 default, BUFFER 16K*

12984 function calls (12981 primitive calls) in 140.456 CPU seconds

   Ordered by: internal time
   List reduced from 40 to 3 due to restriction <3>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       27  139.250    5.157  140.304    5.196 gzipstreams.py:80(__fill)
     2119    0.950    0.000    0.950    0.000 {built-in method decompress}
        9    0.123    0.014    0.123    0.014 {method 'copy' of
'numpy.ndarray' objects}


*BUFFER 256K*

1080 function calls (1077 primitive calls) in 9.988 CPU seconds

   Ordered by: internal time
   List reduced from 40 to 3 due to restriction <3>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       27    8.870    0.329    9.833    0.364 gzipstreams.py:80(__fill)
      135    0.925    0.007    0.925    0.007 {built-in method decompress}
        9    0.124    0.014    0.124    0.014 {method 'copy' of
'numpy.ndarray' objects}


*BUFFER 1M*

480 function calls (477 primitive calls) in 3.509 CPU seconds

   Ordered by: internal time
   List reduced from 40 to 3 due to restriction <3>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       27    2.329    0.086    3.302    0.122 gzipstreams.py:80(__fill)
       35    0.925    0.026    0.925    0.026 {built-in method decompress}
        9    0.124    0.014    0.124    0.014 {method 'copy' of
'numpy.ndarray' objects}



As you can see there is a dramatic improvement as the time passes from
140 to around 3 seconds.

I think that the default value should be raised a bit (at least 256K),
but as the performance hit can be so big is definitely better to have
this as keyword argument directly in io.loadmat.

Any comment is appreciated.

  - Antonio

PS: the test file used for the profiling is attached.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_setup.py
Type: text/x-python
Size: 339 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/scipy-user/attachments/20090222/a82ed779/attachment.py 


More information about the SciPy-user mailing list