[Numpy-discussion] numpy allocation event hooks
Dag Sverre Seljebotn
d.s.seljebotn@astro.uio...
Mon Jun 18 08:46:59 CDT 2012
On 06/18/2012 12:14 PM, Thouis (Ray) Jones wrote:
> Based on some previous discussion on the numpy list [1] and in
> now-cancelled PRs [2,3], I'd like to solicit opinions on adding an
> interface for numpy memory allocation event tracking, as implemented
> in this PR:
>
> https://github.com/numpy/numpy/pull/309
>
> A brief summary of the changes:
>
> - PyDataMem_NEW/FREE/RENEW become functions in the numpy API.
> (they used to be macros for malloc/free/realloc)
> These are the functions used to manage allocations for array's
> internal data. Most other numpy data is allocated through Python's
> allocator.
>
> - PyDataMem_NEW/RENEW return void* instead of char*.
>
> - Adds PyDataMem_SetEventHook() to the API, with this description:
> * Sets the allocation event hook for numpy array data.
> * Takes a PyDataMem_EventHookFunc *, which has the signature:
> * void hook(void *old, void *new, size_t size, void *user_data).
> * Also takes a void *user_data, and void **old_data.
> *
> * Returns a pointer to the previous hook or NULL. If old_data is
> * non-NULL, the previous user_data pointer will be copied to it.
> *
> * If not NULL, hook will be called at the end of each PyDataMem_NEW/FREE/RENEW:
> * result = PyDataMem_NEW(size) -> (*hook)(NULL, result,
> size, user_data)
> * PyDataMem_FREE(ptr) -> (*hook)(ptr, NULL, 0, user_data)
> * result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size,
> user_data)
> *
> * When the hook is called, the GIL will be held by the calling
> * thread. The hook should be written to be reentrant, if it performs
> * operations that might cause new allocation events (such as the
> * creation/descruction numpy objects, or creating/destroying Python
> * objects which might cause a gc)
>
>
> The PR also includes an example using the hook functions to track
> allocation via Python callback funcions (in
> tools/allocation_tracking).
>
> Why I think this is worth adding to numpy, even though other tools may
> be able to provide similar functionality:
>
> - numpy arrays use orders of magnitude more memory than most python
> objects, and this is often a limiting factor in algorithms.
>
> - numpy can behave in complicated ways with regards to memory
> management, e.g., views, OWNDATA, temporaries, etc., making it
> sometimes difficult to know where memory usage problems are
> happening and why.
>
> - numpy attracts a large number of programmers with limited low-level
> programming expertise, and who don't have the skills to use external
> tools (or time/motivation to acquire those skills), but still need
> to be able to diagnose these sorts of problems.
>
> - Other tools are not well integrated with Python, and vary a great
> deal between OS and compiler setup.
>
> I appreciate any feedback.
Are the hooks able to change how allocation happens/override allocation?
If one goes to this much pain already, I think one might as well go the
extra step and allow hooks to override memory allocation.
At least something to think about -- of course the above (as I
understand it) would be a good start on a pluggable allocator even if it
isn't done right away.
Examples:
- Allocate NumPy arrays in process-shared memory using shmem/mmap
- Allocate NumPy arrays on some boundary (16-byte, 4096-byte..) using
memalign
Dag
More information about the NumPy-Discussion
mailing list