[Numpy-discussion] what is the best way to do a statistical mode operation?
Sun Oct 3 12:24:57 CDT 2010
On Sun, Oct 3, 2010 at 7:41 AM, Gordon Wrigley <firstname.lastname@example.org> wrote:
> I have an array of uint8's that has a shape of X*Y*Z*8, I would like to
> calculate modes along the 8 axis so that I end up with an array that has the
> shape X*Y*Z and is full of modes.
> I'm having problems finding a good way of doing this. My attempts at
> solving this using bincount or histogram produce an intermediate array that
> is 32x the size of my input data and somewhat larger than I have the memory
> to deal with.
If you were using bincount or histogram, you must have been explicitly
looping over the other three dimensions. In that case, why should there be
such a large intermediate array?
> Can anyone suggest a good way to produce modes over sets of 8 bytes?
For what it's worth, here's a mode calculation that uses bincount on the
ranks of the data rather than the data itself. Since it uses bincount, I
don't think it can be vectorized efficiently, so you are still stuck
explicitly looping over the other three dimensions.
import numpy as np
y = np.sort(x)
starts = np.concatenate((, np.diff(y).astype(bool).astype(int)))
starts_sum = starts.cumsum()
counts = np.bincount(starts_sum)
arg_mode_freq = counts.argmax()
counts_sum = counts.cumsum()
mode = y[counts_sum[arg_mode_freq-1]]
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion