[Numpy-discussion] what is the best way to do a statistical mode operation?
Warren Weckesser
warren.weckesser@enthought....
Sun Oct 3 12:24:57 CDT 2010
On Sun, Oct 3, 2010 at 7:41 AM, Gordon Wrigley <gordon@tolomea.com> wrote:
> I have an array of uint8's that has a shape of X*Y*Z*8, I would like to
> calculate modes along the 8 axis so that I end up with an array that has the
> shape X*Y*Z and is full of modes.
> I'm having problems finding a good way of doing this. My attempts at
> solving this using bincount or histogram produce an intermediate array that
> is 32x the size of my input data and somewhat larger than I have the memory
> to deal with.
>
If you were using bincount or histogram, you must have been explicitly
looping over the other three dimensions. In that case, why should there be
such a large intermediate array?
> Can anyone suggest a good way to produce modes over sets of 8 bytes?
>
>
For what it's worth, here's a mode calculation that uses bincount on the
ranks of the data rather than the data itself. Since it uses bincount, I
don't think it can be vectorized efficiently, so you are still stuck
explicitly looping over the other three dimensions.
-----
import numpy as np
def mode(x):
y = np.sort(x)
starts = np.concatenate(([1], np.diff(y).astype(bool).astype(int)))
starts_sum = starts.cumsum()
counts = np.bincount(starts_sum)
arg_mode_freq = counts.argmax()
counts_sum = counts.cumsum()
mode = y[counts_sum[arg_mode_freq-1]]
return mode
-----
Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20101003/d58571ca/attachment.html
More information about the NumPy-Discussion
mailing list