[Numpy-discussion] Counting array elements

Todd Miller jmiller at stsci.edu
Fri Oct 29 11:19:14 CDT 2004

I have returned from our astronomical data systems conference and I am
going to take a short cut and summarize what I saw as the key
developments of this thread.  I apologize for not responding sooner and
individually but the web-mail system I use isn't effective for
conducting any kind of discussion.  You guys did a great job sorting
this out this week.  I marked my key points with **.  The rest is
probably only for people with a lot of patience.

** I've finally come to terms with the fact that functions are the right
way to do numarray rather than methods.   The arguments in the Numeric
manual are no more persuasive now than they ever were,  but Stephen
Walton's remarks about method explosion finally convinced me what the
"real" reason for doing functions is that using methods combines every
new feature under the umbrella of a single namespace, the NumArray
class.  Using functions lets us partition things into modules which can
be used selectively and makes a more extensible and understandable
system.  Thanks Stephen.

A couple people remarked that using .flat might solve everything with
something like a.flat.sum() or sum(ravel(a).  This gets to the original
motivation for the sum() method, which was the codification of a simple
and storage efficient technique for reducing noncontiguous arrays.  The
first point is that a non-contiguous array cannot generally be reshaped
without making a copy.   The basic idea of the sum() method is to do
*two* reductions,  the first, along a single axis,  results in a smaller
contiguous array.  In the case of astronomical images which are
generally square or at least non-degenerate,  the reduction result is a
*much* smaller array.  The second reduction handles all the remaining
dimensions since .flat is guaranteed to work because the array is
contiguous.  The end result is a complete sum() without righting
additional ufuncs or making an array copy.

There was understandable confusion about why .flat is sometimes allowed
to fail.  Since it is an attribute,  we thought it inappropriate to make
it return a copy of the source array and chose instead to raise an
exception.  In contrast, it is reasonable for the ravel() function to
return a completely different array, so it always works.  (I just
noticed that ravel() is not named flat()).  Some of our more
contemporary thinkers suggested using iterators to produce a .flat which
always works.  If anyone has an idea how to make this work with good
performance,  please let me know;  I don't.

** Tim Hochberg pointed out that we can overload the reduction (and not
accumulation?) axis parameter with an "all" or a tuple describing a
sequence of axes to reduce along.  My perception was that there was a
consensus behind this and in any case I'm in agreement with Tim.  Alan
Isaac pointed out that None might be better here than "all" and I
agree.  At this point,  I think sumAll() is dead, the sum() method will
be deprecated, and the reductions should be expanded as Tim suggested.

** Peter Verveer made some comments about the expectations of a naive
user regarding reductions, namely that "all" should be the default.   My
own experience bears this out,  and I am torn about what to do here. 
Chris Barker pointed out the need for backward compatibility with
Numeric,  and given the current numarray goal of supporting SciPy,  this
need is growing stronger and more complex.  SciPy uses yet another axis
convention.  If anyone has any ideas how to handle these multiple
conventions with elegance,  let me know.

A number of people commented on our naming conventions, an issue which
we have side stepped for the moment with sumAll().  My impression is
that, for better or worse, numarray uses the lowerUpper() version of
Camel case.  I think this is very much a matter of personal taste and
don't claim to have any.   My guess is that numarray is probably
inconsistent at the moment, in part because lowerUpper() often
degenerates into merely lower() which degenerates into confusion. 


More information about the Numpy-discussion mailing list