[Numpy-discussion] On my Cython/NumPy project
Dag Sverre Seljebotn
Sun Jun 22 18:14:51 CDT 2008
Matthew Brett wrote:
>> The feature of compiling code for multiple types is somewhat orthogonal
>> ndarray support; better treat them seperately and take one at the time.
> Well, it's relevant to numpy because if you want to implement - for
> example - a numpy sort, then you've got to deal with an unspecified
> number of dimensions, and one of a range of specified types, otherwise
> you will end up copying, casting, rejecting types and so on...
Thanks for the feedback.
Sure; I didn't mean to imply that it was irrelevant, indeed I think it
would be very useful to NumPy users. I just mean to say that it is
orthogonal -- it is also important, but should be treated as a seperate
feature that is independent of NumPy support as such. To give you a flavor
of why this is a nontrivial issue, consider a function with four different
numpy arrays. Then you would need to create approx. 15**4 different
versions of it, one for each combination of types -- not feasible at all.
I.e., you need some way of specifying that "these two arrays have the same
datatype", and so on, i.e. some kind of generalized/template programming.
I'd rather focus on the easy case for now; not precluding more features
When it comes to not knowing the number of dimensions, the right way to go
about that would be to support NumPy dimension-neutral array iterators in
a nice way. I have a lot of thoughts about that too, but it's lower on
the priority list and I'll wait with discussing it until the more direct
case is solved (as that is the one new NumPy/Cython users will miss most).
On the negative indices side; I really like the unsigned int idea. The
problem with "range" is that it can, at least in theory, grow larger than
MAX_INT, and so we are cautious about automatically inferring that the
iterator variable should be unsigned int (Python ints can be arbitrarily
large). Of course, one could compile two versions of each loop and have an
if-statement...again, I'll probably specifically drop the negative test
for explicitly declared unsigned int for now, while having "range" imply
unsigned int will be dealt with together with more general type inference
which Cython developers are also thinking about (though whether we'll have
the developer resources for it is another issue).
More information about the Numpy-discussion