[Numpy-discussion] reading *big* inhomogenous text matrices *fast*?
Wed Aug 13 23:32:45 CDT 2008
On Wed, 13 Aug 2008 21:42:51 -0500, Robert Kern wrote:
> Here is the appropriate snippet in Objects/listobject.c:
> /* This over-allocates proportional to the list size, making
> * for additional growth. The over-allocation is mild, but is *
> enough to give linear-time amortized behavior over a long *
> sequence of appends() in the presence of a poorly-performing *
> system realloc().
> * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88,
> ... */
> new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) +
> Raymond Hettinger had a good talk at PyCon this year about the details
> of the Python containers. Here are the slides from the EuroPython
> version (I assume).
Thanks! Looks like the only caveat is that the whole thing may slow down
if the reallocation operation itself is very inefficient. Which probably
isn't the case with a modern Linux distro and recent libc. I'm thinking
whatever went wrong had to be my fault :-)
> Primarily, it's the fact that we have views of arrays that might be
> floating around that prevents us from reallocating as a matter of
> course. Now, we do have a .resize() method which will explicitly
> reallocate the array, but it will only work if you don't have any views
> on the array floating around. During your file reading, this is probably
> valid, so you may want to give it a try using a similar reallocation
> strategy as lists. I'd be interested in seeing some benchmarks comparing
> this strategy with the others.
That will be the next thing for me to try if my current approach becomes
too memory-inefficient. Good idea!
More information about the Numpy-discussion