[SciPy-dev] huge speed regression in loadmat from 0.6.0 to 0.7.0

Scott David Daniels Scott.Daniels@Acm....
Wed Feb 11 14:21:30 CST 2009


Ryan May wrote:
> On Wed, Feb 11, 2009 at 2:03 PM, Scott David Daniels 
> <Scott.Daniels@acm.org <mailto:Scott.Daniels@acm.org>> wrote:
> 
>     Ryan May wrote:
>      > ... Well, here's a patch against gzipstreams.py that changes to
>     add the
>      > chunks to a list and only add to the string at the very end. See
>     if it
>      > helps your case.  If not, is there somewhere you can put the
>     datafile so
>      > that we can test with it?
>     Well, in your patch, instead of:
>     @@ -95,11 +100,12 @@
>                  data = self.fileobj.read(n_to_fetch)
>                  self._bytes_read += len(data)
>                  if data:
>     -                self.data += self._unzipper.decompress(data)
>     +                self_data += self._unzipper.decompress(data)
>                  if len(data) < n_to_fetch: # hit end of file
>     -                self.data += self._unzipper.flush()
>     +                self_data += self._unzipper.flush()
>                      self.exhausted = True
>                      break
>     +        self.data += ''.join(self_data)
> 
>     Use:
>     @@ -95,11 +100,12 @@
>                  data = self.fileobj.read(n_to_fetch)
>                  self._bytes_read += len(data)
>                  if data:
>     -                self.data += self._unzipper.decompress(data)
>     +                self_data.append(self._unzipper.decompress(data))
>                  if len(data) < n_to_fetch: # hit end of file
>     -                self.data += self._unzipper.flush()
>     +                self_data.append(self._unzipper.flush())
>                      self.exhausted = True
>                      break
>     +        self.data += ''.join(self_data)
> 
> 
> Yeah, you're right.  I thought += for lists just mapped to append, but 
> apparently it appends other lists, but extends the list by other 
> sequences.  Weird.
> 
> But if you do make that change, it solves your performance problem?

I am not the OP.  I just noticed a problem.  However, there is another
The loop control is now wrong:
          while read_to_end or len(self.data) < bytes:
Clearly the second clause won't work right, so deeper surgery on your
patch is needed.  I'd calculate needed bytes = bytes - len(self.data)
and decrement it by the length of each chunk added to self_data.
But clearly I don't understand what is going on, since I see bytes
initialized to -1 and never updated in the fragment, so the loop
control boils down to "while read_to_end:".  I think the code needs
some further study there.

--Scott David Daniels
Scott.Daniels@Acm.Org



More information about the Scipy-dev mailing list