[Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch)

Travis Oliphant travis@continuum...
Tue Jun 26 10:02:05 CDT 2012

>> (I have not read the whole cython discussion yet)
> So here's the summary. It's rather complicated but also incredibly neat 
> :-) And technical details can be hidden behind a tight API.

Could you provide a bit more context for this list.     I think this is an important technology concept.   I'd like to understand better how well it jives with Numba-produced APIs and how we can make use of it in NumPy.   

Where exactly would this be used in the NumPy API?  What would it replace? 

>  - We introduce a C-level metaclass, "extensibletype", which to each 
> type adds a branch-miss-free string->pointer hash table. The ndarray 
> type is made an instance of this metaclass, so that you can do
> PyCustomSlots_GetTable(array_object->ob_type)
>  - The hash table uses a perfect hashing scheme:
>   a) We take the lower 64 bits of md5 of the lookup string (this can be 
> done compile-time or module-load-time) as a pre-hash "h".
>   b) When looking up the table for a key with pre-hash "h", the index 
> in the table is given by
> ((h >> table->r) & table->m1) ^ table->d[r & table->m2]
> Then, *if* the element is present, it will always be found on the first 
> try; the table is guaranteed collisionless. This means that an expensive 
> branch-miss can be avoided. It is really incredibly fast in practice, 
> with a 0.5 ns penalty on my 1.8 GHz laptop.
> The magic is in finding the right table->r and table->d. For a 64-slot 
> table, parameters r and d[0]..d[63] can be found in 10us on my machine 
> (it's an O(n) operation). (table->d[i] has type uint16_t)
> (This algorithm was found in an academic paper which I'm too lazy to dig 
> up from that thread right now; perfect hashing is an active research field.)
> The result? You can use this table to store function pointers in the 
> type, like C++ virtual tables or like the built-in slots like 
> tp_get_buffer, but *without* having to agree on everything at 
> compile-time like in C++. And the only penalty is ~0.5 ns per call and 
> some cache usage.
> Cython would use this to replace the current custom "cdef class" vtable 
> with something more tools could agree on, e.g. store function pointers 
> in the table with keys like
> "method:foo:i4i8->f4"
> But NumPy could easily store entries relating to its C API in the same 
> hash table,
> "numpy:SHAPE"
> Then, the C API functions would all take PyObject*, look up the fast 
> hash table on the ob_type.
> This allows for incredibly flexible duck typing on the C level.

This does sound very nice.   

> PyArray_Check would just check for the presence of the C API but not 
> care about the actual Python type, i.e., no PyObject_TypeCheck.
> Me and Robert have talked a lot about this and will move forward with it 
> for Cython. Obviously I don't expect others than me to pick it up for 
> NumPy so we'll see... I'll write up a specification document sometimes 
> over the next couple of weeks as we need that even if only for Cython.

We will look forward to what you come up with. 

Best regards,


More information about the NumPy-Discussion mailing list