[Numpy-discussion] Numpy's policy for releasing memory
Tue Nov 13 02:41:37 CST 2012
On Tue, Nov 13, 2012 at 8:26 AM, Austin Bingham
> I'm trying to understand how numpy decides when to release memory and
> whether it's possible to exert any control over that. The situation is that
> I'm profiling memory usage on a system in which a great deal of the overall
> memory is tied up in ndarrays. Since numpy manages ndarray memory on its own
> (i.e. without the python gc, or so it seems), I'm finding that I can't do
> much to convince numpy to release memory when things get tight. For python
> object, for example, I can explicitly run gc.collect().
> So, in an effort to at least understand the system better, can anyone tell
> me how/when numpy decides to release memory? And is there any way via either
> the Python or C-API to explicitly request release? Thanks.
Numpy array memory is released when the corresponding Python objects
are deleted, so it exactly follows Python's rules. You can't
explicitly request release, because by definition, if memory is not
released, then it means that it's still accessible somehow, so
releasing it could create segfaults. Perhaps you have stray references
sitting around that you have forgotten to clear -- that's a common
cause of memory leaks in Python. gc.get_referrers() can be useful to
debug such things.
Some things to note:
- Numpy uses malloc() instead of going through the Python low-level
memory allocation layer (which itself is a wrapper around malloc with
various optimizations for small objects). This is really only relevant
because it might create some artifacts depending on how your memory
profiler gathers data.
- gc.collect() doesn't do that much in Python... it only matters if
you have circular references. Mostly Python releases the memory
associated with objects as soon as the object becomes unreferenced.
You could try avoiding circular references, and then gc.collect()
won't even do anything.
- If you have multiple views of the same memory in numpy, then they
share the same underlying memory, so that memory won't be released
until all of the views objects are released. (The one thing to watch
out for is you can do something like 'huge_array = np.zeros((2,
10000000)); tiny_array = a[:, 100]' and now since tiny_array is a view
onto huge_array, so long as a reference to tiny_array exists the full
big memory allocation will remain.)
More information about the NumPy-Discussion