[SciPy-dev] huge speed regression in loadmat from 0.6.0 to 0.7.0
Ryan May
rmay31@gmail....
Wed Feb 11 14:40:01 CST 2009
On Wed, Feb 11, 2009 at 2:21 PM, Scott David Daniels
<Scott.Daniels@acm.org>wrote:
> Ryan May wrote:
> > On Wed, Feb 11, 2009 at 2:03 PM, Scott David Daniels
> > <Scott.Daniels@acm.org <mailto:Scott.Daniels@acm.org>> wrote:
> >
> > Ryan May wrote:
> > > ... Well, here's a patch against gzipstreams.py that changes to
> > add the
> > > chunks to a list and only add to the string at the very end. See
> > if it
> > > helps your case. If not, is there somewhere you can put the
> > datafile so
> > > that we can test with it?
> > Well, in your patch, instead of:
> > @@ -95,11 +100,12 @@
> > data = self.fileobj.read(n_to_fetch)
> > self._bytes_read += len(data)
> > if data:
> > - self.data += self._unzipper.decompress(data)
> > + self_data += self._unzipper.decompress(data)
> > if len(data) < n_to_fetch: # hit end of file
> > - self.data += self._unzipper.flush()
> > + self_data += self._unzipper.flush()
> > self.exhausted = True
> > break
> > + self.data += ''.join(self_data)
> >
> > Use:
> > @@ -95,11 +100,12 @@
> > data = self.fileobj.read(n_to_fetch)
> > self._bytes_read += len(data)
> > if data:
> > - self.data += self._unzipper.decompress(data)
> > + self_data.append(self._unzipper.decompress(data))
> > if len(data) < n_to_fetch: # hit end of file
> > - self.data += self._unzipper.flush()
> > + self_data.append(self._unzipper.flush())
> > self.exhausted = True
> > break
> > + self.data += ''.join(self_data)
> >
> >
> > Yeah, you're right. I thought += for lists just mapped to append, but
> > apparently it appends other lists, but extends the list by other
> > sequences. Weird.
> >
> > But if you do make that change, it solves your performance problem?
>
> I am not the OP. I just noticed a problem. However, there is another
> The loop control is now wrong:
> while read_to_end or len(self.data) < bytes:
> Clearly the second clause won't work right, so deeper surgery on your
> patch is needed. I'd calculate needed bytes = bytes - len(self.data)
> and decrement it by the length of each chunk added to self_data.
> But clearly I don't understand what is going on, since I see bytes
> initialized to -1 and never updated in the fragment, so the loop
> control boils down to "while read_to_end:". I think the code needs
> some further study there.
>
<places bag over head> I can't believe I didn't notice you weren't the OP.
And yeah, I forgot the loop control. Clearly, this is evidence that I
shouldn't start my day with creating a patch, though I did at least have the
sense to run the test suite. Obviously, the tests don't exercise a code
path that uses the len(self.data) < bytes.
As far as bytes goes, it isn't initialized to -1, but rather read_to_end is
a boolean set to the value of (bytes == -1), so that you can pass bytes in
as -1 and read all the data.
Anyhow, for anyone who cares, here's a patch that removes the
braindeaded-ness and should actually work.
Ryan
--
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma
Sent from: Norman Oklahoma United States.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-dev/attachments/20090211/86efc346/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gzipstreams_speedup.diff
Type: application/octet-stream
Size: 1443 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/scipy-dev/attachments/20090211/86efc346/attachment-0001.obj
More information about the Scipy-dev
mailing list