[Numpy-discussion] numpy allocation event hooks

Thouis (Ray) Jones thouis@gmail....
Mon Jun 18 05:14:34 CDT 2012


Based on some previous discussion on the numpy list [1] and in
now-cancelled PRs [2,3], I'd like to solicit opinions on adding an
interface for numpy memory allocation event tracking, as implemented
in this PR:

https://github.com/numpy/numpy/pull/309

A brief summary of the changes:

- PyDataMem_NEW/FREE/RENEW become functions in the numpy API.
  (they used to be macros for malloc/free/realloc)
  These are the functions used to manage allocations for array's
  internal data.  Most other numpy data is allocated through Python's
  allocator.

- PyDataMem_NEW/RENEW return void* instead of char*.

- Adds PyDataMem_SetEventHook() to the API, with this description:
 * Sets the allocation event hook for numpy array data.
 * Takes a PyDataMem_EventHookFunc *, which has the signature:
 *        void hook(void *old, void *new, size_t size, void *user_data).
 *   Also takes a void *user_data, and void **old_data.
 *
 * Returns a pointer to the previous hook or NULL.  If old_data is
 * non-NULL, the previous user_data pointer will be copied to it.
 *
 * If not NULL, hook will be called at the end of each PyDataMem_NEW/FREE/RENEW:
 *   result = PyDataMem_NEW(size)        -> (*hook)(NULL, result,
size, user_data)
 *   PyDataMem_FREE(ptr)                 -> (*hook)(ptr, NULL, 0, user_data)
 *   result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size,
user_data)
 *
 * When the hook is called, the GIL will be held by the calling
 * thread.  The hook should be written to be reentrant, if it performs
 * operations that might cause new allocation events (such as the
 * creation/descruction numpy objects, or creating/destroying Python
 * objects which might cause a gc)


The PR also includes an example using the hook functions to track
allocation via Python callback funcions (in
tools/allocation_tracking).

Why I think this is worth adding to numpy, even though other tools may
be able to provide similar functionality:

- numpy arrays use orders of magnitude more memory than most python
  objects, and this is often a limiting factor in algorithms.

- numpy can behave in complicated ways with regards to memory
  management, e.g., views, OWNDATA, temporaries, etc., making it
  sometimes difficult to know where memory usage problems are
  happening and why.

- numpy attracts a large number of programmers with limited low-level
  programming expertise, and who don't have the skills to use external
  tools (or time/motivation to acquire those skills), but still need
  to be able to diagnose these sorts of problems.

- Other tools are not well integrated with Python, and vary a great
  deal between OS and compiler setup.

I appreciate any feedback.

Ray Jones


[1] http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062373.html
[2] (python callbacks) https://github.com/numpy/numpy/pull/284
[3] (C-level logging)  https://github.com/numpy/numpy/pull/301


More information about the NumPy-Discussion mailing list