[Numpy-discussion] reading *big* inhomogenous text matrices *fast*?

Daniel Lenski dlenski@gmail....
Wed Aug 13 21:07:29 CDT 2008


On Wed, 13 Aug 2008 20:55:02 -0500, robert.kern wrote:
>> This is similar to what I tried originally!  Unfortunately, repeatedly
>> appending to a list seems to be very slow... I guess Python keeps
>> reallocating and copying the list as it grows.  (It would be nice to be
>> able to tune the increments by which the list size increases.)
> 
> The list reallocation schedule is actually fairly well-tuned as it is.
> Appending to a list object should be amortized O(1) time.

Hi Robert,

This is very interesting!  Do you know what exactly is the allocation 
strategy of list.append, or have a reference?

According to this page, http://effbot.org/zone/python-list.htm:

"The time needed to append an item to the list is “amortized constant”; 
whenever the list needs to allocate more memory, it allocates room for a 
few items more than it actually needs, to avoid having to reallocate on 
each call (this assumes that the memory allocator is fast; /for huge 
lists, the allocation overhead may push the behaviour towards O(n*n))/."

If I'm not running into O(n^2), there must have been some other 
inefficiency in my first attempt at a solution!  In that case, I guess 
I've come up with an unnecessarily complex solution to a problem where 
Python just Does The Right Thing.  Reminds me of my old Perl days ;-)

>> Repeatedly concatenating arrays with numpy.append or numpy.concatenate
>> is also quite slow, unfortunately. :-(
> 
> Yes. There is no preallocation here.

Makes sense.  I'm sure this cuts down on the amount of metadata that has 
to be carried along with each array instance, and there aren't too many 
applications for it.

Thanks!
Dan



More information about the Numpy-discussion mailing list