[SciPy-User] [ANN] Bottleneck 0.2

Sebastian Haase seb.haase@gmail....
Tue Dec 28 16:40:16 CST 2010


Congratulations !  What do you mean by "templated functions" -- do you
have a way of doing cython template functions now ?

- Sebastian


On Tue, Dec 28, 2010 at 6:15 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Tue, Dec 28, 2010 at 8:57 AM, Keith Goodman <kwgoodman@gmail.com> wrote:
>> On Tue, Dec 28, 2010 at 5:42 AM, Dag Sverre Seljebotn
>> <dagss@student.matnat.uio.no> wrote:
>>> On 12/27/2010 09:04 PM, Keith Goodman wrote:
>>>> Bottleneck is a collection of fast NumPy array functions written in Cython.
>>>>
>>>> The second release of Bottleneck is faster, contains more functions,
>>>> and supports more dtypes.
>>>>
>>>
>>> Another special case for you if you want: It seems that you could add
>>> the case of "mode='c'" to the array declarations, in the case that the
>>> operation goes along the last axis and arr.flags.c_contiguous == True.
>>
>> Wow! That works great for large input arrays:
>>
>>>> a = np.random.rand(1000,1000)
>>>> timeit bn.func.nanmean_2d_float64_axis1(a)
>> 1000 loops, best of 3: 1.52 ms per loop
>>>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
>> 1000 loops, best of 3: 1.18 ms per loop
>>
>> And for medium arrays:
>>
>>>> a = np.random.rand(100,100)
>>>> timeit bn.func.nanmean_2d_float64_axis1(a)
>> 100000 loops, best of 3: 16.3 us per loop
>>>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
>> 100000 loops, best of 3: 13.3 us per loop
>>
>> But the overhead of checking for c contiguous slows things down for
>> small arrays:
>>
>>>> a = np.random.rand(10,10)
>>>> timeit bn.func.nanmean_2d_float64_axis1(a)
>> 1000000 loops, best of 3: 1.28 us per loop
>>>> timeit a.flags.c_contiguous == True; bn.func.nanmean_2d_float64_ccontiguous_axis1(a)
>> 1000000 loops, best of 3: 1.55 us per loop
>>>> timeit a.flags.c_contiguous == True
>> 1000000 loops, best of 3: 201 ns per loop
>>>> timeit a.flags.c_contiguous
>> 10000000 loops, best of 3: 158 ns per loop
>>
>> Plus I'd have to check if the axis is the last one.
>>
>> That's a big speed up for hand coded functions and large input arrays.
>> But I'm not sure how to take advantage of it for general use
>> functions. One option is to provide the low level functions (like
>> nanmean_2d_float64_ccontiguous_axis1) but not use them in the
>> high-level function nanmean.
>>
>> I tried using mode='c' when initializing the output array. But I did
>> not see any speed difference perhaps because the size of the output
>> array is the square root of the input array size. So I tried it with a
>> non-reducing function: move_nanmean. But I didn't see any speed
>> difference. No idea why.
>
> Oh, I don't see a speed difference when I use mode='c' on the input
> array to move_nanmean. Could it be because the function is constantly
> switching at each step along the last axis between indexing into the
> input array and indexing into the output array and in that case
> contiguous memory doesn't help?
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list