[SciPy-User] [ANN] Bottleneck 0.2

Keith Goodman kwgoodman@gmail....
Tue Dec 28 11:15:18 CST 2010


On Tue, Dec 28, 2010 at 8:57 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Tue, Dec 28, 2010 at 5:42 AM, Dag Sverre Seljebotn
> <dagss@student.matnat.uio.no> wrote:
>> On 12/27/2010 09:04 PM, Keith Goodman wrote:
>>> Bottleneck is a collection of fast NumPy array functions written in Cython.
>>>
>>> The second release of Bottleneck is faster, contains more functions,
>>> and supports more dtypes.
>>>
>>
>> Another special case for you if you want: It seems that you could add
>> the case of "mode='c'" to the array declarations, in the case that the
>> operation goes along the last axis and arr.flags.c_contiguous == True.
>
> Wow! That works great for large input arrays:
>
>>> a = np.random.rand(1000,1000)
>>> timeit bn.func.nanmean_2d_float64_axis1(a)
> 1000 loops, best of 3: 1.52 ms per loop
>>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
> 1000 loops, best of 3: 1.18 ms per loop
>
> And for medium arrays:
>
>>> a = np.random.rand(100,100)
>>> timeit bn.func.nanmean_2d_float64_axis1(a)
> 100000 loops, best of 3: 16.3 us per loop
>>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
> 100000 loops, best of 3: 13.3 us per loop
>
> But the overhead of checking for c contiguous slows things down for
> small arrays:
>
>>> a = np.random.rand(10,10)
>>> timeit bn.func.nanmean_2d_float64_axis1(a)
> 1000000 loops, best of 3: 1.28 us per loop
>>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
> 1000000 loops, best of 3: 1.55 us per loop
>>> timeit a.flags.c_contiguous == True
> 1000000 loops, best of 3: 201 ns per loop
>>> timeit a.flags.c_contiguous
> 10000000 loops, best of 3: 158 ns per loop
>
> Plus I'd have to check if the axis is the last one.
>
> That's a big speed up for hand coded functions and large input arrays.
> But I'm not sure how to take advantage of it for general use
> functions. One option is to provide the low level functions (like
> nanmean_2d_float64_ccontiguous_axis1) but not use them in the
> high-level function nanmean.
>
> I tried using mode='c' when initializing the output array. But I did
> not see any speed difference perhaps because the size of the output
> array is the square root of the input array size. So I tried it with a
> non-reducing function: move_nanmean. But I didn't see any speed
> difference. No idea why.

Oh, I don't see a speed difference when I use mode='c' on the input
array to move_nanmean. Could it be because the function is constantly
switching at each step along the last axis between indexing into the
input array and indexing into the output array and in that case
contiguous memory doesn't help?


More information about the SciPy-User mailing list