[Numpy-discussion] Optimizing mean(axis=0) on a 3D array

Tim Hochberg tim.hochberg at ieee.org
Sat Aug 26 12:59:38 CDT 2006


Martin Spacek wrote:
> Hello,
>
> I'm a bit ignorant of optimization in numpy.
>
> I have a movie with 65535 32x32 frames stored in a 3D array of uint8 
> with shape (65535, 32, 32). I load it from an open file f like this:
>
>  >>> import numpy as np
>  >>> data = np.fromfile(f, np.uint8, count=65535*32*32)
>  >>> data = data.reshape(65535, 32, 32)
>
> I'm picking several thousand frames more or less randomly from 
> throughout the movie and finding the mean frame over those frames:
>
>  >>> meanframe = data[frameis].mean(axis=0)
>
> frameis is a 1D array of frame indices with no repeated values in it. If 
> it has say 4000 indices in it, then the above line takes about 0.5 sec 
> to complete on my system. I'm doing this for a large number of different 
> frameis, some of which can have many more indices in them. All this 
> takes many minutes to complete, so I'm looking for ways to speed it up.
>
> If I divide it into 2 steps:
>
>  >>> temp = data[frameis]
>  >>> meanframe = temp.mean(axis=0)
>
> and time it, I find the first step takes about 0.2 sec, and the second 
> takes about 0.3 sec. So it's not just the mean() step, but also the 
> indexing step that's taking some time.
>
> If I flatten with ravel:
>
>  >>> temp = data[frameis].ravel()
>  >>> meanframe = temp.mean(axis=0)
>
> then the first step still takes about 0.2 sec, but the mean() step drops 
> to about 0.1 sec. But of course, this is taking a flat average across 
> all pixels in the movie, which isn't what I want to do.
>
> I have a feeling that the culprit is the non contiguity of the movie 
> frames being averaged, but I don't know how to proceed.
>
> Any ideas? Could reshaping the data somehow speed things up? Would 
> weave.blitz or weave.inline or pyrex help?
>
> I'm running numpy 0.9.8
>
> Thanks,
>
> Martin
>   
Martin,

Here's an approach (mean_accumulate) that avoids making any copies of 
the data. It runs almost 4x as fast as your approach (called baseline 
here) on my box. Perhaps this will be useful:

    frames = 65535
    samples = 4000

    data = (256 * np.random.random((frames, 32, 32))).astype(np.uint8)
    indices = np.arange(frames)
    random.shuffle(indices)
    indices = indices[:samples]

    def mean_baseline(data, indices):
        return data[indices].mean(axis=0)
       
    def mean_accumulate(data, indices):
        result = np.zeros([32, 32], float)
        for i in indices:
            result += data[i]
        result /= len(indices)
        return result
       
    if __name__ == "__main__":
        import timeit
        print mean_baseline(data, indices)[0,:8]
        print timeit.Timer("s.mean_baseline(s.data, s.indices)", "import
    scratch as s").timeit(10)
        print mean_accumulate(data, indices)[0,:8]
        print timeit.Timer("s.mean_accumulate(s.data, s.indices)",
    "import scratch as s").timeit(10)


This gives:

    [ 126.947    127.39175  128.03725  129.83425  127.98925  126.866   
    128.5352 127.6205 ]
    3.95907664242
    [ 126.947    127.39175  128.03725  129.83425  127.98925  126.866   
    128.53525 127.6205 ]
    1.06913644053

I also wondered if sorting indices would help since it would help 
improve locality of reference, but when I measured that it appeared to 
help not at all.

regards,

-tim





More information about the Numpy-discussion mailing list