[Numpy-discussion] Counting array elements

Peter Verveer verveer at embl-heidelberg.de
Mon Oct 25 15:48:03 CDT 2004


On Oct 25, 2004, at 11:02 PM, Tim Hochberg wrote:

> Peter Verveer wrote:
>
>>
>> On 25 Oct 2004, at 19:32, Russell E Owen wrote:
>>
>>> At 7:08 PM +0200 2004-10-25, Peter Verveer wrote:
>>>
>>>> On 25 Oct 2004, at 18:51, Gary Strangman wrote:
>>>>
>>>>>
>>>>>>  I'm not sure how feasible it is, but I'd much rather an 
>>>>>> efficient, non-copying, 1-D view of an noncontiguous array (from 
>>>>>> an enhanced version of flat or ravel or whatever) than a bunch of 
>>>>>> extra methods. The former allows all of the standard methods to 
>>>>>> just work efficiently using sum(ravel(A)) or sum(A.flat) [ and 
>>>>>> max and min, etc]. Making special whole array methods for 
>>>>>> everything just leads to method eplosion.
>>>>>
>>>>>
>>>>>  I completely agree with this ... an efficient flat/ravel would 
>>>>> seem to solve many of the issues being raised. Forgive the 
>>>>> potentially naive question here, but is there any reason such an 
>>>>> efficient, enhanced view can't be implemented for the .flat 
>>>>> method?
>>>>
>>>>
>>>> I believe it is not possible without copying data. The strides 
>>>> between elements of a noncontiguous array are not always the same, 
>>>> so you cannot efficiently view it as a 1D array.
>>>
>>>
>>> How about providing an iterator that counts through all the elements 
>>> of an array (e.g. arr.itervalues()). So long as C extensions could 
>>> efficiently make use of such an iterator, I think it'd do the job.
>>
>>
>> It would still be slower, because you would need a function call at 
>> each element that returns a value. Not a problem if you do a lot of 
>> work at each element, but if you are just adding values you want a 
>> custom written C function. You can do it a the C level with macros or 
>> so, (I do that in nd_image) but that would not help at the python 
>> level.
>>
>>> One could also imagine:
>>> - arr.iteritems(), which returned (index, value) for each item
>>> - a mask argument: a boolean array the same shape as the data array; 
>>> True means elide the corresponding value from the data array
>>> - general support for indexing
>>
>>
>> Essentially you are suggesting to expose iterators at the python 
>> level that iterate over an array in some predefined way. That is 
>> possible, but I doubt it will be efficient.
>>
>> At the C level however, it might be worth thinking about as a way of 
>> easing writing functions in C. I proposed to do it the other way 
>> around in an earlier mail: providing a set of generic functions that 
>> take a python or a C function to be applied at each element. I most 
>> likely will implement something in that direction, but I should give 
>> your idea also some thought.
>>
>>> More generally, I agree that sum should work the same as a function 
>>> and a method, and that an extra axis argument could be a good thing 
>>> (it is so common elsewhere, e.g. size). I'd be tempted to break 
>>> backwards compatibility to fix this, since numarray is still new and 
>>> the current situation is very confusing.
>>
>>
>> I would absolutely vote for such a change. Simply because we would 
>> like a range of such functions, e.g. minimum, maximum, and so on. 
>> Even if we have to leave sum() as it is, I think we should have the 
>> alternatives, we would just have to come up with an alternative name 
>> for sum(). In fact I would consider volunteering implementing these 
>> functions.
>
> Why the need to break backwards compatability? If one is going to 
> reimplement sum, et al so as to operate on an arbitrary set of axes 
> there's no reason one couldn't maintain the current behaviour as the 
> default.

It seems to me that the behavior one would expect for a function like 
that, would be to apply the operation to the whole array. Not along an 
axis. What would you expect as a new user if you call a minimum() 
function? A single value that is the minimum. So that is the logical 
choice for the default behavior, I would think.

>  All that is required is to allow axis to be a number (current 
> behaviour), a tuple (reduce across the designated axes) or some 
> special value to sum over all (None?, "all"?).

Yes, that would be the idea anyway. The question is what should be the 
default behavior for this type of functions, something I think we 
should not decide based on the current behavior of a single existing 
function, but based on what makes the most sense. That is obviously 
something that can be discussed...

>
> Having two sum functions with different names is not particularly 
> better than the current proposal of a method and a function.

This is certainly true. I would prefer breaking compability...

Peter







More information about the Numpy-discussion mailing list