[Numpy-discussion] calculating weighted majority using two 3D arrays

Gregory, Matthew matt.gregory@oregonstate....
Thu Mar 6 12:37:02 CST 2008

Eads, Damian wrote:
> You may need to be a bit more specific by what you mean by 
> weighted majority. What are the range of values for values 
> and weights, specifically? This sounds a lot like pixel 
> classification where each pixel is classified with a majority 
> vote over its weights and values. Is that what you're trying to do?
> Many numpy functions (e.g. mean, max, min, sum) have an axis 
> parameter, which specifies the axis along which the statistic 
> is computed. Omitting the axis parameter causes the statistic 
> to be computed over all values in the multidimensional array.
> Suppose the 'values' array contains floating point numbers in 
> the range
> -1 to 1 and a larger absolute value gives a larger 
> confidence. Also suppose the weights are floating point 
> numbers between 0 and 1. The weighted majority vote for pixel 
> i,j over 10 real-valued (confidenced) votes, each vote having 
> a separate weight, is computed by
>    w_vote = numpy.sign((values[:,i,j]*weights[:,i,j]).sum())
> This can be vectorized to give a weighted majority vote for 
> each pixel by doing
>    w_vote = numpy.sign((values*weights).sum(axis=0))
> The values*weights expression gives a weighted prediction. 
> This also works if the 'values' are just predictions from the 
> set {-1, 1}, i.e. 
> there are ten classifiers, each one predicts either -1 and 1 
> on each pixel.

Damian, thank you for the helpful response.  I should have been a bit
more explicit about what I meant by weighted majority.  In my case, I
need to find a discrete value (i.e. class) that occurs most often among
ten observations where weighting is pre-determined by an
inverse-distance calculation.  Ignoring for a moment the
multidimensionality issue, my values and weights arrays might look like

values = array([14, 32, 12, 50, 2, 8, 19, 12, 19, 10])
weights = array([0.5, 0.1, 0.6, 0.1, 0.8, 0.3, 0.8, 0.4, 0.9, 0.2])

My function to calculate the majority looks like this:

def weightedMajority(a, b):

	# Put all the samples into a dictionary with weights summed for
	# duplicate values
	wDict = {}
	for i in xrange(len(a)):
		(value, weight) = (a[i], b[i])

		if wDict.has_key(value):
			wDict[value] += weight
			wDict[value] = weight

	# Create arrays of the values and weights
	values = numpy.array(wDict.keys())
	weights = numpy.array(wDict.values())

	# Return the index of the maximum value
	index = numpy.argmax(weights)

	# Return the majority value 
	return values[index]

In the above example:

>> maj = weightedMajority(values, weights)
>> maj

Correct me if I'm wrong, but I don't think that your example will work
when I am looking to return a discrete value from the values set, but
you may see something that I'm doing that is truly inefficient!

thanks, matt

More information about the Numpy-discussion mailing list