[Numpy-discussion] Optimizing mean(axis=0) on a 3D array

Travis Oliphant oliphant.travis at ieee.org
Sat Aug 26 05:26:32 CDT 2006

Martin Spacek wrote:
> Hello,
> I'm a bit ignorant of optimization in numpy.
> I have a movie with 65535 32x32 frames stored in a 3D array of uint8 
> with shape (65535, 32, 32). I load it from an open file f like this:
>  >>> import numpy as np
>  >>> data = np.fromfile(f, np.uint8, count=65535*32*32)
>  >>> data = data.reshape(65535, 32, 32)
> I'm picking several thousand frames more or less randomly from 
> throughout the movie and finding the mean frame over those frames:
>  >>> meanframe = data[frameis].mean(axis=0)
> frameis is a 1D array of frame indices with no repeated values in it. If 
> it has say 4000 indices in it, then the above line takes about 0.5 sec 
> to complete on my system. I'm doing this for a large number of different 
> frameis, some of which can have many more indices in them. All this 
> takes many minutes to complete, so I'm looking for ways to speed it up.
> If I divide it into 2 steps:
>  >>> temp = data[frameis]
>  >>> meanframe = temp.mean(axis=0)
> and time it, I find the first step takes about 0.2 sec, and the second 
> takes about 0.3 sec. So it's not just the mean() step, but also the 
> indexing step that's taking some time.

If frameis is 1-D, then you should be able to use

temp = data.take(frameis,axis=0) 

for the first step.   This can be quite a bit faster (and is a big 
reason why take is still around).   There are several reasons for this 
(one of which is that index checking is done over the entire list when 
using indexing). 


More information about the Numpy-discussion mailing list