[Numpy-discussion] Counting array elements
verveer at embl-heidelberg.de
Sat Oct 23 04:14:04 CDT 2004
I thought I just give my point of view on this, since I do believe we
should give these some thought.
On Oct 23, 2004, at 12:18 AM, Russell E Owen wrote:
> OK, since I seem to be in a grumpy mood today, here are some examples
> (probably nothing new here):
> - I'll expose my ignorance, but I find the take stuff and fancy
> indexing nearly incomprehensible. I've tried to follow the examples
> (several times--i.e. every time I need to do something fancy), but
> generally I either flail around until I find something that works, or
> give up and write a C extension.
I agree, it is very complicated, I always have trouble getting
understanding what is going on when I use take and indexing. More
documentation may help.
> - I'd like to write C/C++ code that would work on multiple array
> types. This seems a natural use of C++ templates, but that doesn't
> seem to be "how it's done". I hate to think how the internal code is
> managing this without being a horrible sphaghetti of code repeated for
> each array type.
This is a good point. If you look at examples for implementing
something in C, you always see that the code only handles a single data
type, usually converting all input to double type. That is not always a
good way to write an extension if you want it to be of generic use
(e.g. the FFT module does not handle 32 bits floating point well, which
is a problem for big arrays). Some support in writing functions that
handle multiple data types would be good.
> The nd_image package is the closest I've come to finding source code
> that makes any sense to me in this areay. But it uses so many
> custom-defined specialized functions that I figured it was just too
> much work to figure out w/out a manual (and risky to rely on these
> functions since they are internal to the package).
The internal nd_image C functions are indeed not exported and should
not be used to implement extensions. That is going to stay that way
since I do not plan to document these, and in any case, exposing such
functions is not the purpose of the module.
On the other hand, some of the techniques use may be generally useful.
I could try to factor some of the functions and macros out and write
something up on the use of these to write extensions that handle
multiple data types.
> So I gave up and just support the one data type I really need now.
> Very disappointing.
Yes, it should be easier to do this, I agree. Using C macros as a 'poor
man' templating system is in fact not too complicated (although pretty
Another approach that I have tried to use in nd_image is to provide
generic functions that take a python or a C function to implement
functionality. For instance to implement an arbitrary filter function
in nd_image you only need to implement a function that calculates the
filter at one point. You then call a generic filter function that does
the heavy lifting of dealing with multiple array types, iterating over
the array, dealing with borders and such, applying the function at each
array element. The filter function can be in python, but can also be a
C function, communicated by a CObject.
Maybe some of these type functions could be provided with the numarray
package. This could simplify writing extensions a lot. Would there be
interest for a package of such functions? If there is I could think
about it a bit more, and propose (and implement) something in the form
of an extension.
> - Important functions are sometimes buried in a non-obvious (to me)
> For example: try to find that location at which an array has a minimum
> value (if there's more than one such point, pick any). You'd think
> it'd be a standard numarray function, wouldn't you? After all, you can
> ask for the minimum value. Now try to find it.
Agreed, this bothered me too.
> Well, I started out by trying to figure out how to get argmin to do
> the job. Horrible.
> Fortunately I finally found minimum_position buried in nd_image.
It is there because numarray did not provide it... But it is also there
because it offers much functionality that would not be appropriate for
the main package. It is part of the object measurement functions. A
simpler, possibly more efficient routine should maybe be part of the
> - Masked arrays are not integrated. Thus a lot of important filtering
> and stuff simply cannot be done on masked data without writing custom
> extensions. For instance I'd like to do a median-filter that ignores
> masked data (taking the median of non-masked data only).
I agree very much! To be honest, I do not like the ma package much. I
don't like the idea of having to use a separate package with a
different array type that duplicates the functionality in the main
package. I think it would be much better if all functions (where it
makes sense) in numarray would accept an optional mask argument. To me
it makes more sense to provide the mask with the operation, not as part
of the array like in ma (a package like ma could still be layered on
top.) I realize it would be a lot of work to make all numarray
functions mask aware, but it is something to think about maybe.
> - For 2-d images x and y are reversed. I know this isn't going to
> change, but it is a headache every time I have to write new image
> processing code.
This is not really a problem I think, but you have to get used to it.
If you treat the last dimension always as X and the first as Y, you
have the same layout in memory as is usual in most image processing
software. So X corresponds to axis=1 and Y to axis=0. Or use axis=-1
More information about the Numpy-discussion