[Numpy-discussion] Ufunc memory access optimization

Pauli Virtanen pav@iki...
Tue Jun 15 06:37:30 CDT 2010


pe, 2010-06-11 kello 10:52 +0200, Hans Meine kirjoitti:
> On Friday 11 June 2010 10:38:28 Pauli Virtanen wrote:
[clip]
> > I think I there was some code ready to implement this shuffling. I'll try
> > to dig it out and implement the shuffling.
> 
> That would be great!
> 
> Ullrich Köthe has implemented this for our VIGRA/numpy bindings:
>   http://tinyurl.com/fast-ufunc
> At the bottom you can see that he basically wraps all numpy.ufuncs he can find 
> in the numpy top-level namespace automatically.

Ok, here's the branch:

        http://github.com/pv/numpy-work/compare/master...feature;ufunc-memory-access-speedup

Some samples: (the reported times in braces are times without the
optimization) 

        x = np.zeros([100,100])
        %timeit x + x
        10000 loops, best of 3: 106 us (99.1 us) per loop
        %timeit x.T + x.T
        10000 loops, best of 3: 114 us (164 us) per loop
        %timeit x.T + x
        10000 loops, best of 3: 176 us (171 us) per loop
        
        x = np.zeros([100,5,5])
        %timeit x.T + x.T
        10000 loops, best of 3: 47.7 us (61 us) per loop
        
        x = np.zeros([100,5,100]).transpose(2,0,1)
        %timeit np.cos(x)
        100 loops, best of 3: 3.77 ms (9 ms) per loop

As expected, some improvement can be seen. There's also appears to be
an additional 5 us (~ 700 inner loop operations it seems) overhead
coming from somewhere; perhaps this can still be reduced.

-- 
Pauli Virtanen



More information about the NumPy-Discussion mailing list