[Numpy-discussion] Getting C-function pointers from Python to C

Nathaniel Smith njs@pobox....
Tue Apr 10 08:29:32 CDT 2012


On Tue, Apr 10, 2012 at 2:15 PM, Dag Sverre Seljebotn
<d.s.seljebotn@astro.uio.no> wrote:
> On 04/10/2012 03:10 PM, Dag Sverre Seljebotn wrote:
>> On 04/10/2012 03:00 PM, Nathaniel Smith wrote:
>>> On Tue, Apr 10, 2012 at 1:39 PM, Dag Sverre Seljebotn
>>> <d.s.seljebotn@astro.uio.no>   wrote:
>>>> On 04/10/2012 12:37 PM, Nathaniel Smith wrote:
>>>>> On Tue, Apr 10, 2012 at 1:57 AM, Travis Oliphant<travis@continuum.io>     wrote:
>>>>>> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote:
>>>>>>
>>>>>> ...isn't this an operation that will be performed once per compiled
>>>>>> function? Is the overhead of the easy, robust method (calling ctypes.cast)
>>>>>> actually measurable as compared to, you know, running an optimizing
>>>>>> compiler?
>>>>>>
>>>>>> Yes, there can be significant overhead.   The compiler is run once and
>>>>>> creates the function.   This function is then potentially used many, many
>>>>>> times.    Also, it is entirely conceivable that the "build" step happens at
>>>>>> a separate "compilation" time, and Numba actually loads a pre-compiled
>>>>>> version of the function from disk which it then uses at run-time.
>>>>>>
>>>>>> I have been playing with a version of this using scipy.integrate and
>>>>>> unfortunately the overhead of ctypes.cast is rather significant --- to the
>>>>>> point of making the code-path using these function pointers to be useless
>>>>>> when without the ctypes.cast overhed the speed up is 3-5x.
>>>>>
>>>>> Ah, I was assuming that you'd do the cast once outside of the inner
>>>>> loop (at the same time you did type compatibility checking and so
>>>>> forth).
>>>>>
>>>>>> In general, I think NumPy will need its own simple function-pointer object
>>>>>> to use when handing over raw-function pointers between Python and C.   SciPy
>>>>>> can then re-use this object which also has a useful C-API for things like
>>>>>> signature checking.    I have seen that ctypes is nice but very slow and
>>>>>> without a compelling C-API.
>>>>>
>>>>> Sounds reasonable to me. Probably nicer than violating ctypes's
>>>>> abstraction boundary, and with no real downsides.
>>>>>
>>>>>> The kind of new C-level cfuncptr object I imagine has attributes:
>>>>>>
>>>>>> void *func_ptr;
>>>>>> char *signature string  /* something like 'dd->d' to indicate a function
>>>>>> that takes two doubles and returns a double */
>>>>>
>>>>> This looks like it's setting us up for trouble later. We already have
>>>>> a robust mechanism for describing types -- dtypes. We should use that
>>>>> instead of inventing Yet Another baby type system. We'll need to
>>>>> convert between this representation and dtypes anyway if you want to
>>>>> use these pointers for ufunc loops... and if we just use dtypes from
>>>>> the start, we'll avoid having to break the API the first time someone
>>>>> wants to pass a struct or array or something.
>>>>
>>>> For some of the things we'd like to do with Cython down the line,
>>>> something very fast like what Travis describes is exactly what we need;
>>>> specifically, if you have Cython code like
>>>>
>>>> cdef double f(func):
>>>>       return func(3.4)
>>>>
>>>> that may NOT be called in a loop.
>>>>
>>>> But I do agree that this sounds overkill for NumPy+numba at the moment;
>>>> certainly for scipy.integrate where you can amortize over N function
>>>> samples. But Travis perhaps has a usecase I didn't think of.
>>>
>>> It sounds sort of like you're disagreeing with me but I can't tell
>>> about what, so maybe I was unclear :-).
>>>
>>> All I was saying was that a list-of-dtype-objects was probably a
>>> better way to write down a function signature than some ad-hoc string
>>> language. In both cases you'd do some type-compatibility-checking up
>>> front and then use C calling afterwards, and I don't see why
>>> type-checking would be faster or slower for one representation than
>>> the other. (Certainly one wouldn't have to support all possible dtypes
>
> Rereading this, perhaps this is the statement you seek: Yes, doing a
> simple strcmp is much, much faster than jumping all around in memory to
> check the equality of two lists of dtypes. If it is a string less than 8
> bytes in length with the comparison string known at compile-time (the
> Cython case) then the comparison is only a couple of CPU instructions,
> as you can check 64 bits at the time.

Right, that's what I wasn't getting until you mentioned strcmp :-).

That said, the core numpy dtypes are singletons. For this purpose, the
signature could be stored as C array of PyArray_Descr*, but even if we
store it in a Python tuple/list, we'd still end up with a contiguous
array of PyArray_Descr*'s. (I'm assuming that we would guarantee that
it was always-and-only a real PyTupleObject* here.) So for the
function we're talking about, the check would compile down to doing
the equivalent of a 3*pointersize-byte strcmp, instead of a 5-byte
strcmp. That's admittedly worse, but I think the difference between
these two comparisons is unlikely to be measurable, considering that
they're followed immediately by a cache miss when we actually jump to
the function pointer.

-- Nathaniel


More information about the NumPy-Discussion mailing list