[Numpy-discussion] In-place operations
A. M. Archibald
peridot.faceted at gmail.com
Tue Sep 12 12:19:05 CDT 2006
On 12/09/06, Pierre Thibault <thibault at physics.cornell.edu> wrote:
> I would like to have information on the best techniques to do in-place
> calculations and to minimize temporary array creations. To me this
> seems to be very important whenever the arrays become very large.
The first rule of optimization: don't do it yet.
You can usually go through and banish temporary arrays (using ufuncs
and so on) at the cost of readability, code encapsulation, and
thread-safe-ness. But it may not do what you want. I had an
image-processing code that was taking longer than I thought it should
and using two hundred megabytes or so of RAM. So I rewrote it, with a
certain amount of pain, in a way that it used the fewest possible
temporary arrays. It didn't run any faster, and it then took five
hundred megabytes. Because all the arrays ended up being in memory at
once, the memory footprint increased drastically.
malloc() is fast, typically just a handful of instructions; if you're
allocating a giant array, it's almost certainly being allocated using
mmap(), and it can be released back to the OS on deallocation.
But you probably still want to avoid temporary arrays. So:
> More specifically, here are examples that occured in my code
> 1) FFTs: Let A and B be two large arrays, already allocated. I want
> the fft of A to be stored in B. If I just type B = fft(A), there is a
> temprary array creation, right? Is it possible to avoid that?
Doing an FFT in-place is a major challenge, and involves its own
slowdowns, so generally high-level toolkits don't bother. But fft
seems to be like many functions (those generated by interp1d, for
example) that insist on malloc()ing their own arrays to return. Short
of rooting around in the numpy/scipy code, there's no real way around
this for such functions. The best you can do is make actual use of the
allocated array (rather than copy its contents to *another* array and
> 2) Function output: In general, I think the same thing happens with
> functions like
> def f1(array_in):
> array_out = # something using array_in
> return array_out
> Then, if B is already allocated, writing B = f1(A) involves again a
> temporary array creation
Uh, no, not really. The way you have written f1, it probably malloc()s
space for array_out. the address of that space (roughly) is saved in
the array_out variable. If you write B=f1(A), you are just storing the
address in B. The memory is not copied. Even if you do B=f1(A)[::10]
you don't copy the memory.
> I thought instead of doing something like
> def f2(array_in, array_out):
> array_out[:] = # something
> # Is this good practice?
> and call f2(A,B).
This is a ufunc-like solution; you could even make array_out an
optional argument, and return it.
> If I understand well, this still requires a temporary array creation.
> Is there another way of doing that (appart from actually looping
> through the indices of A and B)?
It depends what #something is. If, say, it is 2*array_in, you can
simply do multiply(array_in,2,array_out) to avoid any dynamic
> I guess these considerations are not standard python problems because
> you expect python to take care of memory issues. With big arrays in
> scientific computations, I feel the question is more relevant. I might
> be wrong...
Some of these issues come up when dealing with mutable objects (lists,
dictionaries, and so on). Some of them (the fact that python variables
contain simply references) are discussed in various python FAQs.
A. M. Archibald
More information about the Numpy-discussion