[SciPy-user] Very slow loadmat in scipy 0.7 (regression)

Matthieu Brucher matthieu.brucher@gmail....
Sun Feb 22 05:58:43 CST 2009


Hi,

This issue popped up in the scipy-dev ML and will be fixed in the future.

Matthieu

2009/2/22 Antonino Ingargiola <tritemio@gmail.com>:
> Hi to the list,
>
> I'm loading matlab file of a few tents of MB in python with
> scipy.io.loadmat. With scipy 0.6 (the stock ubuntu 8.10 version) the
> load takes a few seconds (2-5 sec). Now with scipy 0.7 it takes much
> longer, around  80 secs.
>
> I did a profile and found that the all the time is spent in
> GzipInputStream.__zfill method. I blindly tried to change the
> GzipInputStream.blocksize attribute from 16K to 256K and 1M and found
> that the performances become exponentially better. Here there are the
> profile resuts loading a 33M matlab file:
>
> *Scipy 0.7 default, BUFFER 16K*
>
> 12984 function calls (12981 primitive calls) in 140.456 CPU seconds
>
>   Ordered by: internal time
>   List reduced from 40 to 3 due to restriction <3>
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>       27  139.250    5.157  140.304    5.196 gzipstreams.py:80(__fill)
>     2119    0.950    0.000    0.950    0.000 {built-in method decompress}
>        9    0.123    0.014    0.123    0.014 {method 'copy' of
> 'numpy.ndarray' objects}
>
>
> *BUFFER 256K*
>
> 1080 function calls (1077 primitive calls) in 9.988 CPU seconds
>
>   Ordered by: internal time
>   List reduced from 40 to 3 due to restriction <3>
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>       27    8.870    0.329    9.833    0.364 gzipstreams.py:80(__fill)
>      135    0.925    0.007    0.925    0.007 {built-in method decompress}
>        9    0.124    0.014    0.124    0.014 {method 'copy' of
> 'numpy.ndarray' objects}
>
>
> *BUFFER 1M*
>
> 480 function calls (477 primitive calls) in 3.509 CPU seconds
>
>   Ordered by: internal time
>   List reduced from 40 to 3 due to restriction <3>
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>       27    2.329    0.086    3.302    0.122 gzipstreams.py:80(__fill)
>       35    0.925    0.026    0.925    0.026 {built-in method decompress}
>        9    0.124    0.014    0.124    0.014 {method 'copy' of
> 'numpy.ndarray' objects}
>
>
>
> As you can see there is a dramatic improvement as the time passes from
> 140 to around 3 seconds.
>
> I think that the default value should be raised a bit (at least 256K),
> but as the performance hit can be so big is definitely better to have
> this as keyword argument directly in io.loadmat.
>
> Any comment is appreciated.
>
>  - Antonio
>
> PS: the test file used for the profiling is attached.
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
>



-- 
Information System Engineer, Ph.D.
Website: http://matthieu-brucher.developpez.com/
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
LinkedIn: http://www.linkedin.com/in/matthieubrucher


More information about the SciPy-user mailing list