[Numpy-discussion] what is the best way to do a statistical mode operation?

Warren Weckesser warren.weckesser@enthought....
Sun Oct 3 12:24:57 CDT 2010

On Sun, Oct 3, 2010 at 7:41 AM, Gordon Wrigley <gordon@tolomea.com> wrote:

> I have an array of uint8's that has a shape of X*Y*Z*8, I would like to
> calculate modes along the 8 axis so that I end up with an array that has the
> shape X*Y*Z and is full of modes.
> I'm having problems finding a good way of doing this. My attempts at
> solving this using bincount or histogram produce an intermediate array that
> is 32x the size of my input data and somewhat larger than I have the memory
> to deal with.

If you were using bincount or histogram, you must have been explicitly
looping over the other three dimensions.  In that case, why should there be
such a large intermediate array?

> Can anyone suggest a good way to produce modes over sets of 8 bytes?

For what it's worth, here's a mode calculation that uses bincount on the
ranks of the data rather than the data itself.  Since it uses bincount, I
don't think it can be vectorized efficiently, so you are still stuck
explicitly looping over the other three dimensions.

import numpy as np

def mode(x):
    y = np.sort(x)
    starts = np.concatenate(([1], np.diff(y).astype(bool).astype(int)))
    starts_sum = starts.cumsum()
    counts = np.bincount(starts_sum)
    arg_mode_freq = counts.argmax()
    counts_sum = counts.cumsum()
    mode = y[counts_sum[arg_mode_freq-1]]
    return mode

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20101003/d58571ca/attachment.html 

More information about the NumPy-Discussion mailing list