# [Numpy-discussion] Optimizing mean(axis=0) on a 3D array

Tim Hochberg tim.hochberg at ieee.org
Sat Aug 26 12:59:38 CDT 2006

```Martin Spacek wrote:
> Hello,
>
> I'm a bit ignorant of optimization in numpy.
>
> I have a movie with 65535 32x32 frames stored in a 3D array of uint8
> with shape (65535, 32, 32). I load it from an open file f like this:
>
>  >>> import numpy as np
>  >>> data = np.fromfile(f, np.uint8, count=65535*32*32)
>  >>> data = data.reshape(65535, 32, 32)
>
> I'm picking several thousand frames more or less randomly from
> throughout the movie and finding the mean frame over those frames:
>
>  >>> meanframe = data[frameis].mean(axis=0)
>
> frameis is a 1D array of frame indices with no repeated values in it. If
> it has say 4000 indices in it, then the above line takes about 0.5 sec
> to complete on my system. I'm doing this for a large number of different
> frameis, some of which can have many more indices in them. All this
> takes many minutes to complete, so I'm looking for ways to speed it up.
>
> If I divide it into 2 steps:
>
>  >>> temp = data[frameis]
>  >>> meanframe = temp.mean(axis=0)
>
> and time it, I find the first step takes about 0.2 sec, and the second
> takes about 0.3 sec. So it's not just the mean() step, but also the
> indexing step that's taking some time.
>
> If I flatten with ravel:
>
>  >>> temp = data[frameis].ravel()
>  >>> meanframe = temp.mean(axis=0)
>
> then the first step still takes about 0.2 sec, but the mean() step drops
> to about 0.1 sec. But of course, this is taking a flat average across
> all pixels in the movie, which isn't what I want to do.
>
> I have a feeling that the culprit is the non contiguity of the movie
> frames being averaged, but I don't know how to proceed.
>
> Any ideas? Could reshaping the data somehow speed things up? Would
> weave.blitz or weave.inline or pyrex help?
>
> I'm running numpy 0.9.8
>
> Thanks,
>
> Martin
>
Martin,

Here's an approach (mean_accumulate) that avoids making any copies of
the data. It runs almost 4x as fast as your approach (called baseline
here) on my box. Perhaps this will be useful:

frames = 65535
samples = 4000

data = (256 * np.random.random((frames, 32, 32))).astype(np.uint8)
indices = np.arange(frames)
random.shuffle(indices)
indices = indices[:samples]

def mean_baseline(data, indices):
return data[indices].mean(axis=0)

def mean_accumulate(data, indices):
result = np.zeros([32, 32], float)
for i in indices:
result += data[i]
result /= len(indices)
return result

if __name__ == "__main__":
import timeit
print mean_baseline(data, indices)[0,:8]
print timeit.Timer("s.mean_baseline(s.data, s.indices)", "import
scratch as s").timeit(10)
print mean_accumulate(data, indices)[0,:8]
print timeit.Timer("s.mean_accumulate(s.data, s.indices)",
"import scratch as s").timeit(10)

This gives:

[ 126.947    127.39175  128.03725  129.83425  127.98925  126.866
128.5352 127.6205 ]
3.95907664242
[ 126.947    127.39175  128.03725  129.83425  127.98925  126.866
128.53525 127.6205 ]
1.06913644053

I also wondered if sorting indices would help since it would help
improve locality of reference, but when I measured that it appeared to
help not at all.

regards,

-tim

```