[Numpy-discussion] Counting array elements
Peter Verveer
verveer at embl-heidelberg.de
Mon Oct 25 15:48:03 CDT 2004
On Oct 25, 2004, at 11:02 PM, Tim Hochberg wrote:
> Peter Verveer wrote:
>
>>
>> On 25 Oct 2004, at 19:32, Russell E Owen wrote:
>>
>>> At 7:08 PM +0200 2004-10-25, Peter Verveer wrote:
>>>
>>>> On 25 Oct 2004, at 18:51, Gary Strangman wrote:
>>>>
>>>>>
>>>>>> I'm not sure how feasible it is, but I'd much rather an
>>>>>> efficient, non-copying, 1-D view of an noncontiguous array (from
>>>>>> an enhanced version of flat or ravel or whatever) than a bunch of
>>>>>> extra methods. The former allows all of the standard methods to
>>>>>> just work efficiently using sum(ravel(A)) or sum(A.flat) [ and
>>>>>> max and min, etc]. Making special whole array methods for
>>>>>> everything just leads to method eplosion.
>>>>>
>>>>>
>>>>> I completely agree with this ... an efficient flat/ravel would
>>>>> seem to solve many of the issues being raised. Forgive the
>>>>> potentially naive question here, but is there any reason such an
>>>>> efficient, enhanced view can't be implemented for the .flat
>>>>> method?
>>>>
>>>>
>>>> I believe it is not possible without copying data. The strides
>>>> between elements of a noncontiguous array are not always the same,
>>>> so you cannot efficiently view it as a 1D array.
>>>
>>>
>>> How about providing an iterator that counts through all the elements
>>> of an array (e.g. arr.itervalues()). So long as C extensions could
>>> efficiently make use of such an iterator, I think it'd do the job.
>>
>>
>> It would still be slower, because you would need a function call at
>> each element that returns a value. Not a problem if you do a lot of
>> work at each element, but if you are just adding values you want a
>> custom written C function. You can do it a the C level with macros or
>> so, (I do that in nd_image) but that would not help at the python
>> level.
>>
>>> One could also imagine:
>>> - arr.iteritems(), which returned (index, value) for each item
>>> - a mask argument: a boolean array the same shape as the data array;
>>> True means elide the corresponding value from the data array
>>> - general support for indexing
>>
>>
>> Essentially you are suggesting to expose iterators at the python
>> level that iterate over an array in some predefined way. That is
>> possible, but I doubt it will be efficient.
>>
>> At the C level however, it might be worth thinking about as a way of
>> easing writing functions in C. I proposed to do it the other way
>> around in an earlier mail: providing a set of generic functions that
>> take a python or a C function to be applied at each element. I most
>> likely will implement something in that direction, but I should give
>> your idea also some thought.
>>
>>> More generally, I agree that sum should work the same as a function
>>> and a method, and that an extra axis argument could be a good thing
>>> (it is so common elsewhere, e.g. size). I'd be tempted to break
>>> backwards compatibility to fix this, since numarray is still new and
>>> the current situation is very confusing.
>>
>>
>> I would absolutely vote for such a change. Simply because we would
>> like a range of such functions, e.g. minimum, maximum, and so on.
>> Even if we have to leave sum() as it is, I think we should have the
>> alternatives, we would just have to come up with an alternative name
>> for sum(). In fact I would consider volunteering implementing these
>> functions.
>
> Why the need to break backwards compatability? If one is going to
> reimplement sum, et al so as to operate on an arbitrary set of axes
> there's no reason one couldn't maintain the current behaviour as the
> default.
It seems to me that the behavior one would expect for a function like
that, would be to apply the operation to the whole array. Not along an
axis. What would you expect as a new user if you call a minimum()
function? A single value that is the minimum. So that is the logical
choice for the default behavior, I would think.
> All that is required is to allow axis to be a number (current
> behaviour), a tuple (reduce across the designated axes) or some
> special value to sum over all (None?, "all"?).
Yes, that would be the idea anyway. The question is what should be the
default behavior for this type of functions, something I think we
should not decide based on the current behavior of a single existing
function, but based on what makes the most sense. That is obviously
something that can be discussed...
>
> Having two sum functions with different names is not particularly
> better than the current proposal of a method and a function.
This is certainly true. I would prefer breaking compability...
Peter
More information about the Numpy-discussion
mailing list