[Numpy-discussion] Getting C-function pointers from Python to C

Nathaniel Smith njs@pobox....
Tue Apr 10 08:00:38 CDT 2012


On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn
<d.s.seljebotn@astro.uio.no> wrote:
> On 04/10/2012 12:37 PM, Nathaniel Smith wrote:
>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant<travis@continuum.io>  wrote:
>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote:
>>>
>>> ...isn't this an operation that will be performed once per compiled
>>> function? Is the overhead of the easy, robust method (calling ctypes.cast)
>>> actually measurable as compared to, you know, running an optimizing
>>> compiler?
>>>
>>> Yes, there can be significant overhead.   The compiler is run once and
>>> creates the function.   This function is then potentially used many, many
>>> times.    Also, it is entirely conceivable that the "build" step happens at
>>> a separate "compilation" time, and Numba actually loads a pre-compiled
>>> version of the function from disk which it then uses at run-time.
>>>
>>> I have been playing with a version of this using scipy.integrate and
>>> unfortunately the overhead of ctypes.cast is rather significant --- to the
>>> point of making the code-path using these function pointers to be useless
>>> when without the ctypes.cast overhed the speed up is 3-5x.
>>
>> Ah, I was assuming that you'd do the cast once outside of the inner
>> loop (at the same time you did type compatibility checking and so
>> forth).
>>
>>> In general, I think NumPy will need its own simple function-pointer object
>>> to use when handing over raw-function pointers between Python and C.   SciPy
>>> can then re-use this object which also has a useful C-API for things like
>>> signature checking.    I have seen that ctypes is nice but very slow and
>>> without a compelling C-API.
>>
>> Sounds reasonable to me. Probably nicer than violating ctypes's
>> abstraction boundary, and with no real downsides.
>>
>>> The kind of new C-level cfuncptr object I imagine has attributes:
>>>
>>> void *func_ptr;
>>> char *signature string  /* something like 'dd->d' to indicate a function
>>> that takes two doubles and returns a double */
>>
>> This looks like it's setting us up for trouble later. We already have
>> a robust mechanism for describing types -- dtypes. We should use that
>> instead of inventing Yet Another baby type system. We'll need to
>> convert between this representation and dtypes anyway if you want to
>> use these pointers for ufunc loops... and if we just use dtypes from
>> the start, we'll avoid having to break the API the first time someone
>> wants to pass a struct or array or something.
>
> For some of the things we'd like to do with Cython down the line,
> something very fast like what Travis describes is exactly what we need;
> specifically, if you have Cython code like
>
> cdef double f(func):
>     return func(3.4)
>
> that may NOT be called in a loop.
>
> But I do agree that this sounds overkill for NumPy+numba at the moment;
> certainly for scipy.integrate where you can amortize over N function
> samples. But Travis perhaps has a usecase I didn't think of.

It sounds sort of like you're disagreeing with me but I can't tell
about what, so maybe I was unclear :-).

All I was saying was that a list-of-dtype-objects was probably a
better way to write down a function signature than some ad-hoc string
language. In both cases you'd do some type-compatibility-checking up
front and then use C calling afterwards, and I don't see why
type-checking would be faster or slower for one representation than
the other. (Certainly one wouldn't have to support all possible dtypes
up front, the point is just that they give us more room to grow
later.)

-- Nathaniel


More information about the NumPy-Discussion mailing list