[Numpy-discussion] Getting C-function pointers from Python to C

Dag Sverre Seljebotn d.s.seljebotn@astro.uio...
Tue Apr 10 07:36:05 CDT 2012


Hi Travis,

we've been discussing almost the exact same thing in Cython (on a 
workshop, not on the mailing list, I'm afraid). Our specific 
example-usecase was passing a Cython function to scipy.integrate.

On 04/10/2012 02:57 AM, Travis Oliphant wrote:
>
> On Apr 9, 2012, at 7:21 PM, Nathaniel Smith wrote:
>
>> ...isn't this an operation that will be performed once per compiled
>> function? Is the overhead of the easy, robust method (calling
>> ctypes.cast) actually measurable as compared to, you know, running an
>> optimizing compiler?
>>
>>
>
> Yes, there can be significant overhead. The compiler is run once and
> creates the function. This function is then potentially used many, many
> times. Also, it is entirely conceivable that the "build" step happens at
> a separate "compilation" time, and Numba actually loads a pre-compiled
> version of the function from disk which it then uses at run-time.
>
> I have been playing with a version of this using scipy.integrate and
> unfortunately the overhead of ctypes.cast is rather significant --- to
> the point of making the code-path using these function pointers to be
> useless when without the ctypes.cast overhed the speed up is 3-5x.

There's an N where the cost of the ctypes.cast is properly amortized 
though, right? The ctypes.cast should only be called once at the 
beginning of scipy.integrate?

> In general, I think NumPy will need its own simple function-pointer
> object to use when handing over raw-function pointers between Python and
> C. SciPy can then re-use this object which also has a useful C-API for
> things like signature checking. I have seen that ctypes is nice but very
> slow and without a compelling C-API.
>
>
> The kind of new C-level cfuncptr object I imagine has attributes:
>
> void *func_ptr;
> char *signature string /* something like 'dd->d' to indicate a function
> that takes two doubles and returns a double */
>
> methods would be:
>
> from_ctypes (classmethod)
> to_ctypes
> and simple inline functions to get the function pointer and the signature.

This is more or less the same format we discussed for Cython functions. 
What we wanted to do is to write Cython code like this:

cpdef double f(double x, double y): ...

and when passing f to scipy.integrate, let it call the inner C function 
directly.

We even worked with the exact same format string in our disscussions :-)

Long term, in Cython we could use the type information together with 
LLVM to generate adapted code wherever Cython calls objects (in call-sites).

So ideally we would want to agree on an API, so that Cython functions 
can be passed to scipy.integrate, and so that numba functions can be 
jumped to directly from Cython code.

Comments:

  - PEP3118-augmented format strings should work well, and we may want 
to enforce a canonicalized subset (i.e. whitespace is not allowed, do 
not use repeat specifiers, ...anything else?)

  - What you propose above already do two pointer jumps (with possibly 
associated cache misses and stalls) if you want to validate the 
signature, which can be eliminated (at least from Cython's perspective).

But I'll let this thread go on a little longer, to figure out the "is 
this needed for NumPy" question, before continuing on my bikeshedding on 
performance issues.

Dag


More information about the NumPy-Discussion mailing list