[Numpy-discussion] Feedback pls on proposed changes to bincount()
Stephen Simmons
mail@stevesimmons....
Sun Mar 11 03:10:45 CDT 2007
Hi,
I'd like to propose some minor modifications to the function
bincount(arr, weights=None), so would like some feedback from other uses
of bincount() before I write this up as a proper patch, .
Background:
bincount() has two forms:
- bincount(x) returns an integer array ians of length max(x)+1 where
ians[n] is the number of times n appears in x.
- bincount(x, weights) returns a double array dans of length max(x)+1
where dans[n] is the sum of elements in the weight vector weights[i] at
the positions where x[i]==n
In both cases, all elements of x must be non-negative.
Proposed changes:
(1) Remove the restriction that elements of x must be non-negative.
Currently bincount() starts by finding max(x) and min(x). If the min
value is negative, an exception is raised. This change proposes
dropping the initial search for min(x), and instead testing for
non-negativity while summing values in the return arrays ians or dans.
Any indexes where where x is negative will be silently ignored. This
will allow selective bincounts where values to ignore are flagged with a
negative bin number.
(2) Allow an optional argument for maximum bin number.
Currently bincount(x) returns an array whose length is dependent on
max(x). It is sometimes preferable to specify the exact size of the
returned array, so this change would add a new optional argument,
max_bin, which is one less than the size of the returned array. Under
this change, bincount() starts by finding max(x) only if max_bin is not
specified. Then the returned array ians or dans is created with length
max_bin+1, and any indexes that would overflow the output array are
silently ignored.
(3) Allow an optional output array, y.
Currently bincount() creates a new output array each time. Sometimes it
is preferable to add results to an existing output array, for example,
when the input array is only available in smaller chunks, or for a
progressive update strategy to avoid fp precision problems when adding
lots of small weights to larger subtotals. Thus we can add an extra
optional argument y that bypasses the creation of an output array.
With these three change, the function signature of bincount() would become:
bincount(x, weights=None, y=None, max_bin=None)
Anyway, that's the general idea. I'd be grateful for any feedback before
I code this up as a patch to _compiled_base.c.
Cheers
Stephen
More information about the Numpy-discussion
mailing list