[Numpy-discussion] NumPy re-factoring project

Jason McCampbell jmccampbell@enthought....
Thu Jun 10 10:47:10 CDT 2010


Hi Chuck,

Good questions.  Responses inline below...

Jason

On Thu, Jun 10, 2010 at 8:26 AM, Charles R Harris <charlesr.harris@gmail.com
> wrote:

>
>
> On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell <
> jmccampbell@enthought.com> wrote:
>
>> Hi everyone,
>>
>> This is a follow-up to Travis's message on the re-factoring project from
>> May 25th and the subsequent discussion. For background, I am a developer at
>> Enthought working on the NumPy re-factoring project with Travis and Scott.
>> The immediate goal from our perspective is to re-factor the core of NumPy
>> into two architectural layers: a true core that is CPython-independent and
>> an interface layer that binds the core to CPython.
>>
>> A design proposal is now posted on the NumPy developer wiki:
>> http://projects.scipy.org/numpy/wiki/NumPyRefactoring
>>
>> The write-up is a fairly high-level description of what we think the split
>> will look like and how to deal with issues such as memory management.  There
>> are also placeholders listed as 'TBD' where more investigation is still
>> needed and will be filled in over time.  At the end of the page there is a
>> section on the C API with a link to a function-by-function breakdown of the
>> C API and whether the function belongs in the interface layer, the core, or
>> need to be split between the two.  All functions listed as 'core' will
>> continue to have an interface-level wrapper with the same name to ensure
>> source-compatibility.
>>
>> All of this, particularly the interface/core function designations, is a
>> first analysis and in flux. The goal is to get the information out and
>> elicit discussion and feedback from the community.
>>
>>
> A few thoughts came to mind while reading the initial writeup.
>
> 1) How is the GIL handled in the callbacks.
>

How to handle the GIL still requires some thought.  The cleanest way, IMHO,
would is for the interface layer to release the lock prior to calling into
the core and then each callback function in the interface is responsible for
re-acquiring it.  That's straightforward to define as a rule and should work
well in general, but I'm worried about potential performance issues if/when
a callback is called in a loop.  A few optimization points is ok, but too
many and it will just be a source of heisenbugs.

One other option is to just use the existing release/acquire macros in NumPy
and redirect them to the interface layer.  Any app that isn't CPython would
just leave those callback pointers NULL.  It's less disruptive but leaves
some very CPython-specific behavior in the core.


> 2) What about error handling? That is tricky to get right, especially in C
> and with reference counting.
>

The error reporting functions in the core will likely look a lot like the
CPython functions - they seem general enough.  The biggest change is the
CPython ones take a PyObject as the error type.  99% of the errors reported
in NumPy use one of a half-dozen pre-defined types that are easy to
translate.  There is at least one case where an object type (complex number)
is dynamically and used as the type, but so far I believe it's only one
case.

The reference counting does get a little more complex because a core routine
will need to decref the core object on error and the interface layer will
need to similarly detect the error and potentially do it's own decref.  Each
layer is still responsible for it's own clean up, but there are now two
opportunities to introduce leaks.


> 3) Is there a general policy as to how the reference counting should be
> handled in specific functions? That is, who does the reference
> incrementing/decrementing?
>

Both layers should implement the existing policy for the objects that it
manages. Essentially a function can use it's caller's reference but needs to
increment the count if it's going to store it.  A new instance is returned
with a refcnt of 1 and the caller needs to clean it up when it's no longer
needed.  But that means that if the core returns a new NpyArray instance to
the interface layer, the receiving function in the interface must allocate a
PyObject wrapper around it and set the wrapper's refcnt to 1 before
returning it.

Is that what you were asking?

4) Boost has some reference counted pointers, have you looked at them? C++
> is admittedly a very different animal for this sort of application.
>

There is also need to replace the usage of PyDict and other uses of CPython
for basic data structures that aren't present in C.  Having access to C++
for this and reference counting would be nice, but has the potential to
break builds for everyone who use the C API.  I think it's worth discussing
for the future but a bigger (and possibly more contentious) change than we
are able to take on for this project.


> Chuck
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100610/4d8593f0/attachment-0001.html 


More information about the NumPy-Discussion mailing list