[SciPy-dev] huge speed regression in loadmat from 0.6.0 to 0.7.0

Ryan May rmay31@gmail....
Wed Feb 11 14:40:01 CST 2009


On Wed, Feb 11, 2009 at 2:21 PM, Scott David Daniels
<Scott.Daniels@acm.org>wrote:

> Ryan May wrote:
> > On Wed, Feb 11, 2009 at 2:03 PM, Scott David Daniels
> > <Scott.Daniels@acm.org <mailto:Scott.Daniels@acm.org>> wrote:
> >
> >     Ryan May wrote:
> >      > ... Well, here's a patch against gzipstreams.py that changes to
> >     add the
> >      > chunks to a list and only add to the string at the very end. See
> >     if it
> >      > helps your case.  If not, is there somewhere you can put the
> >     datafile so
> >      > that we can test with it?
> >     Well, in your patch, instead of:
> >     @@ -95,11 +100,12 @@
> >                  data = self.fileobj.read(n_to_fetch)
> >                  self._bytes_read += len(data)
> >                  if data:
> >     -                self.data += self._unzipper.decompress(data)
> >     +                self_data += self._unzipper.decompress(data)
> >                  if len(data) < n_to_fetch: # hit end of file
> >     -                self.data += self._unzipper.flush()
> >     +                self_data += self._unzipper.flush()
> >                      self.exhausted = True
> >                      break
> >     +        self.data += ''.join(self_data)
> >
> >     Use:
> >     @@ -95,11 +100,12 @@
> >                  data = self.fileobj.read(n_to_fetch)
> >                  self._bytes_read += len(data)
> >                  if data:
> >     -                self.data += self._unzipper.decompress(data)
> >     +                self_data.append(self._unzipper.decompress(data))
> >                  if len(data) < n_to_fetch: # hit end of file
> >     -                self.data += self._unzipper.flush()
> >     +                self_data.append(self._unzipper.flush())
> >                      self.exhausted = True
> >                      break
> >     +        self.data += ''.join(self_data)
> >
> >
> > Yeah, you're right.  I thought += for lists just mapped to append, but
> > apparently it appends other lists, but extends the list by other
> > sequences.  Weird.
> >
> > But if you do make that change, it solves your performance problem?
>
> I am not the OP.  I just noticed a problem.  However, there is another
> The loop control is now wrong:
>          while read_to_end or len(self.data) < bytes:
> Clearly the second clause won't work right, so deeper surgery on your
> patch is needed.  I'd calculate needed bytes = bytes - len(self.data)
> and decrement it by the length of each chunk added to self_data.
> But clearly I don't understand what is going on, since I see bytes
> initialized to -1 and never updated in the fragment, so the loop
> control boils down to "while read_to_end:".  I think the code needs
> some further study there.
>

<places bag over head> I can't believe I didn't notice you weren't the OP.
And yeah, I forgot the loop control.  Clearly, this is evidence that I
shouldn't start my day with creating a patch, though I did at least have the
sense to run the test suite.  Obviously, the tests don't exercise a code
path that uses the len(self.data) < bytes.

As far as bytes goes, it isn't initialized to -1, but rather read_to_end is
a boolean set to the value of (bytes == -1), so that you can pass bytes in
as -1 and read all the data.

Anyhow, for anyone who cares, here's a patch that removes the
braindeaded-ness and should actually work.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Sent from: Norman Oklahoma United States.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-dev/attachments/20090211/86efc346/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gzipstreams_speedup.diff
Type: application/octet-stream
Size: 1443 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/scipy-dev/attachments/20090211/86efc346/attachment-0001.obj 


More information about the Scipy-dev mailing list