[Numpy-discussion] numpy1.2 : make sorts unary ufuncs

Travis E. Oliphant oliphant@enthought....
Sat Apr 19 12:40:44 CDT 2008


Charles R Harris wrote:
>
>
> On Sat, Apr 19, 2008 at 1:29 AM, Charles R Harris 
> <charlesr.harris@gmail.com <mailto:charlesr.harris@gmail.com>> wrote:
> <snip>
>
>
>     On Sat, Apr 19, 2008 at 1:12 AM, Robert Kern
>     <robert.kern@gmail.com <mailto:robert.kern@gmail.com>> wrote:
>
>         On Sat, Apr 19, 2008 at 1:55 AM, Charles R Harris
>         <charlesr.harris@gmail.com <mailto:charlesr.harris@gmail.com>>
>         wrote:
>
>         > Yes, but the inner loop is just something that uses the
>         array values along
>         > that axis to produce another set of values, i.e., it is a
>         vector valued
>         > function of vectors. So is a sort, so is argsort, so is the
>         inner product,
>         > so on and so forth. That's what we have here:
>         >
>         > typedef void (*PyUFuncGenericFunction) (char **, npy_intp *,
>         npy_intp *,
>         > void *);
>         >
>         >  No difference that I can see. It is the call function in
>         PyUFuncObject that
>         > matters.
>
>         I believe this is the disconnect. From my perspective, the
>         fact that
>         the inner kernel function of a ufunc has a sufficient argument
>         list to
>         do a sort isn't important. The signature of that kernel
>         function isn't
>         what makes a ufunc; it's all of the code around it that does
>         broadcasting, type matching and manipulation, etc. If we're
>         changing
>         that code to accommodate sorting, we haven't gained anything.
>         We've
>         just moved some code around; possibly we've reduced the line
>         count,
>         but I fear that we will muddy ufunc implementation with non-ufunc
>         functionality and special cases.
>
>         If you want to go down this road, I think you need to do what
>         Travis
>         suggests: factor out some of the common code between ufuncs
>         and sorts
>         into a "superclass" (not really, but you get the idea), and then
>         implement ufuncs and sorts based on that. I think trying to shove
>         sorts into ufuncs-qua-ufuncs is a bad idea. There is more than one
>         path to code reuse.
>
>
>     Right now we have:
>
>     typedef struct {
>         PyObject_HEAD
>         int nin, nout, nargs;
>         int identity;
>         PyUFuncGenericFunction *functions;
>         void **data;
>         int ntypes;
>         int check_return;
>         char *name, *types;
>         char *doc;
>         void *ptr;
>         PyObject *obj;
>         PyObject *userloops;
>     } PyUFuncObject;
>      
>     Which could be derived from something slightly more general. We
>     could also leave out reduce, accumulate, etc., which are special
>     cases. We then have common code for registration, etc. The call
>     function still has to check types, dispatch the calls for the
>     axis, maybe create output arrays, as for maximum.reduce, and so
>     on. Broadcasting isn't applicable to unary type things and many
>     functions, say in argsort, look unary from the top, so that
>     doesn't enter in.
>
>
> For instance
>
> static void
> BOOL_@kind@(char **args, intp *dimensions, intp *steps, void *func)
> {
>     register intp i;
>     intp is1=steps[0],is2=steps[1],os=steps[2], n=dimensions[0];
>     char *i1=args[0], *i2=args[1], *op=args[2];
>     Bool in1, in2;
>     for(i=0; i<n; i++, i1+=is1, i2+=is2, op+=os) {
>         in1 = (*((Bool *)i1) != 0);
>         in2 = (*((Bool *)i2) != 0);
>         *((Bool *)op)= in1 @OP@ in2;
>     }
> }
>
> It looks to me like broadcasting is achieved by adjusting the step 
> size. The only bothersome detail here is getting the count from the 
> first dimension, that looks a bit fragile.
It shouldn't be fragile.   It's a historical accident that the signature 
looks like that.  This is the signature inherited from Numeric.  All of 
scipy-special would have to be changed in order to change it.

Perhaps the thinking was that there would be multiple "counts" to keep 
track of at some time.  But, I'm not sure.   I've only seen the "first" 
entry used so dimensions is really just ptr_to_int rather than any kind 
of "shape".

-Travis O.



More information about the Numpy-discussion mailing list