[Numpy-discussion] calculating weighted majority using two 3D arrays
Timothy Hochberg
tim.hochberg@ieee....
Thu Mar 6 22:06:57 CST 2008
On Thu, Mar 6, 2008 at 11:37 AM, Gregory, Matthew <
matt.gregory@oregonstate.edu> wrote:
> Eads, Damian wrote:
> > You may need to be a bit more specific by what you mean by
> > weighted majority. What are the range of values for values
> > and weights, specifically? This sounds a lot like pixel
> > classification where each pixel is classified with a majority
> > vote over its weights and values. Is that what you're trying to do?
> >
> > Many numpy functions (e.g. mean, max, min, sum) have an axis
> > parameter, which specifies the axis along which the statistic
> > is computed. Omitting the axis parameter causes the statistic
> > to be computed over all values in the multidimensional array.
> >
> > Suppose the 'values' array contains floating point numbers in
> > the range
> > -1 to 1 and a larger absolute value gives a larger
> > confidence. Also suppose the weights are floating point
> > numbers between 0 and 1. The weighted majority vote for pixel
> > i,j over 10 real-valued (confidenced) votes, each vote having
> > a separate weight, is computed by
> >
> > w_vote = numpy.sign((values[:,i,j]*weights[:,i,j]).sum())
> >
> > This can be vectorized to give a weighted majority vote for
> > each pixel by doing
> >
> > w_vote = numpy.sign((values*weights).sum(axis=0))
> >
> > The values*weights expression gives a weighted prediction.
> > This also works if the 'values' are just predictions from the
> > set {-1, 1}, i.e.
> > there are ten classifiers, each one predicts either -1 and 1
> > on each pixel.
>
> Damian, thank you for the helpful response. I should have been a bit
> more explicit about what I meant by weighted majority. In my case, I
> need to find a discrete value (i.e. class) that occurs most often among
> ten observations where weighting is pre-determined by an
> inverse-distance calculation. Ignoring for a moment the
> multidimensionality issue, my values and weights arrays might look like
> this:
>
> values = array([14, 32, 12, 50, 2, 8, 19, 12, 19, 10])
> weights = array([0.5, 0.1, 0.6, 0.1, 0.8, 0.3, 0.8, 0.4, 0.9, 0.2])
>
> My function to calculate the majority looks like this:
>
> def weightedMajority(a, b):
>
> # Put all the samples into a dictionary with weights summed for
> # duplicate values
> wDict = {}
> for i in xrange(len(a)):
> (value, weight) = (a[i], b[i])
>
> if wDict.has_key(value):
> wDict[value] += weight
> else:
> wDict[value] = weight
>
> # Create arrays of the values and weights
> values = numpy.array(wDict.keys())
> weights = numpy.array(wDict.values())
>
> # Return the index of the maximum value
> index = numpy.argmax(weights)
>
> # Return the majority value
> return values[index]
>
> In the above example:
>
> >> maj = weightedMajority(values, weights)
> >> maj
> 19
>
[SNIP]
If your values are integers in a reasonably small range, then you might want
to use an array to hold your weights as it makes things simpler and likely
faster. For example:
from itertools import izip
def weightedMajority2(a, b):
wMap = np.zeros(256, float) # assume all values fall in [0,255]
for value, weight in izip(a, b):
wMap[value] += weight
return numpy.argmax(wMap)
Regards,
--
. __
. |-\
.
. tim.hochberg@ieee.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080306/619559d8/attachment-0001.html
More information about the Numpy-discussion
mailing list