[SciPy-dev] huge speed regression in loadmat from 0.6.0 to 0.7.0
Scott David Daniels
Scott.Daniels@Acm....
Wed Feb 11 14:21:30 CST 2009
Ryan May wrote:
> On Wed, Feb 11, 2009 at 2:03 PM, Scott David Daniels
> <Scott.Daniels@acm.org <mailto:Scott.Daniels@acm.org>> wrote:
>
> Ryan May wrote:
> > ... Well, here's a patch against gzipstreams.py that changes to
> add the
> > chunks to a list and only add to the string at the very end. See
> if it
> > helps your case. If not, is there somewhere you can put the
> datafile so
> > that we can test with it?
> Well, in your patch, instead of:
> @@ -95,11 +100,12 @@
> data = self.fileobj.read(n_to_fetch)
> self._bytes_read += len(data)
> if data:
> - self.data += self._unzipper.decompress(data)
> + self_data += self._unzipper.decompress(data)
> if len(data) < n_to_fetch: # hit end of file
> - self.data += self._unzipper.flush()
> + self_data += self._unzipper.flush()
> self.exhausted = True
> break
> + self.data += ''.join(self_data)
>
> Use:
> @@ -95,11 +100,12 @@
> data = self.fileobj.read(n_to_fetch)
> self._bytes_read += len(data)
> if data:
> - self.data += self._unzipper.decompress(data)
> + self_data.append(self._unzipper.decompress(data))
> if len(data) < n_to_fetch: # hit end of file
> - self.data += self._unzipper.flush()
> + self_data.append(self._unzipper.flush())
> self.exhausted = True
> break
> + self.data += ''.join(self_data)
>
>
> Yeah, you're right. I thought += for lists just mapped to append, but
> apparently it appends other lists, but extends the list by other
> sequences. Weird.
>
> But if you do make that change, it solves your performance problem?
I am not the OP. I just noticed a problem. However, there is another
The loop control is now wrong:
while read_to_end or len(self.data) < bytes:
Clearly the second clause won't work right, so deeper surgery on your
patch is needed. I'd calculate needed bytes = bytes - len(self.data)
and decrement it by the length of each chunk added to self_data.
But clearly I don't understand what is going on, since I see bytes
initialized to -1 and never updated in the fragment, so the loop
control boils down to "while read_to_end:". I think the code needs
some further study there.
--Scott David Daniels
Scott.Daniels@Acm.Org
More information about the Scipy-dev
mailing list