[Numpy-discussion] reading *big* inhomogenous text matrices *fast*?
Wed Aug 13 21:07:29 CDT 2008
On Wed, 13 Aug 2008 20:55:02 -0500, robert.kern wrote:
>> This is similar to what I tried originally! Unfortunately, repeatedly
>> appending to a list seems to be very slow... I guess Python keeps
>> reallocating and copying the list as it grows. (It would be nice to be
>> able to tune the increments by which the list size increases.)
> The list reallocation schedule is actually fairly well-tuned as it is.
> Appending to a list object should be amortized O(1) time.
This is very interesting! Do you know what exactly is the allocation
strategy of list.append, or have a reference?
According to this page, http://effbot.org/zone/python-list.htm:
"The time needed to append an item to the list is “amortized constant”;
whenever the list needs to allocate more memory, it allocates room for a
few items more than it actually needs, to avoid having to reallocate on
each call (this assumes that the memory allocator is fast; /for huge
lists, the allocation overhead may push the behaviour towards O(n*n))/."
If I'm not running into O(n^2), there must have been some other
inefficiency in my first attempt at a solution! In that case, I guess
I've come up with an unnecessarily complex solution to a problem where
Python just Does The Right Thing. Reminds me of my old Perl days ;-)
>> Repeatedly concatenating arrays with numpy.append or numpy.concatenate
>> is also quite slow, unfortunately. :-(
> Yes. There is no preallocation here.
Makes sense. I'm sure this cuts down on the amount of metadata that has
to be carried along with each array instance, and there aren't too many
applications for it.
More information about the Numpy-discussion