[Numpy-discussion] Pre-allocate array

Chris Barker - NOAA Federal chris.barker@noaa....
Thu Dec 27 11:40:46 CST 2012


On Thu, Dec 27, 2012 at 8:44 AM, Nikolaus Rath <Nikolaus@rath.org> wrote:

> I have an array that I know will need to grow to X elements. However, I
> will need to work with it before it's completely filled.

what sort of "work with it" do you mean? -- resize() is dangerous if
there are any other views on the data block...


> bigarray = np.empty(X)
> current_size = 0
> for i in something:
>     buf = produce_data(i)
>     bigarray[current_size:current_size+len(buf)] = buf
>     current_size += len(buf)
>     # Do things with bigarray[:current_size]
>
> This avoids having to allocate new buffers and copying data around, but
> I have to separately manage the current array size.

yup -- but not a bad option, really.

> Alternatively, I
> could do
>
> bigarray = np.empty(0)
> current_size = 0
> for i in something:
>     buf = produce_data(i)
>     bigarray.resize(len(bigarray)+len(buf))
>     bigarray[-len(buf):] = buf
>     # Do things with bigarray
>
> this is much more elegant, but the resize() calls may have to copy data
> around.

Yes, they will -- but whether that's a problem or not depends on your
use-case. If you are adding elements one-by-one, the re-allocatiing
and copying of memory could be a big overhead. But if buf is not that
"small", then the overhead gets lost in teh wash. Yopu'd have to
profile to be sure, but I found that if, in this case, "buf" is on
order of larger than 1/16 of the size of bigarray, you'll not see it
(vague memory...)

> Is there any way to tell numpy to allocate all the required memory while
> using only a part of it for the array? Something like:
>
> bigarray = np.empty(50, will_grow_to=X)
> bigarray.resize(X) # Guaranteed to work without copying stuff  around

no -- though you could probably fudge it by messing with the strides
-- though you'd need to either keep track of how much memory was
originally allocated, or how much is currently used yourself, like you
did above.

NOTE: I've written a couple of "growable array" classes for just this
problem. One in pure Python, and one in Cython that isn't quite
finished. I've enclosed the pure python one, let me know if your
interested in the Cython version (it may need some work to b fully
functional).

-Chris




-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: accumulator.py
Type: application/octet-stream
Size: 4171 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20121227/0960f143/attachment-0002.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_accumulator.py
Type: application/octet-stream
Size: 5154 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/numpy-discussion/attachments/20121227/0960f143/attachment-0003.obj 


More information about the NumPy-Discussion mailing list