[Numpy-discussion] Slow divide of int64?

Matthew Brett matthew.brett@gmail....
Thu Aug 16 16:45:32 CDT 2012


Hi,

On Mon, Aug 13, 2012 at 9:49 PM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>
>
> On Mon, Aug 13, 2012 at 10:32 PM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
>>
>>
>>
>> On Sat, Aug 11, 2012 at 6:36 PM, Matthew Brett <matthew.brett@gmail.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> A friend of mine just pointed out that dividing by int64 is
>>> considerably slower than multiplying in numpy:
>>>
>>> <script>
>>> from timeit import timeit
>>>
>>> import numpy as np
>>> import numpy.random as npr
>>>
>>> sz = (1024,)
>>> a32 = npr.randint(1, 5001, sz).astype(np.int32)
>>> b32 = npr.randint(1, 5001, sz).astype(np.int32)
>>> a64 = a32.astype(np.int64)
>>> b64 = b32.astype(np.int64)
>>>
>>> print 'Mul32', timeit('d = a32 * b32', 'from __main__ import a32, b32')
>>> print 'Div32', timeit('d = a32 / b32', 'from __main__ import a32, b32')
>>> print 'Mul64', timeit('d = a64 * b64', 'from __main__ import a64, b64')
>>> print 'Div64', timeit('d = a64 / b64', 'from __main__ import a64, b64')
>>> </script>
>>>
>>> gives (64 bit Debian Intel system, numpy trunk):
>>>
>>> Mul32 2.71295905113
>>> Div32 6.61985301971
>>> Mul64 2.78101611137
>>> Div64 22.8217148781
>>>
>>> with similar values for numpy 1.5.1.
>>>
>>> Crude testing with Matlab and Octave suggests they do not seem to have
>>> this same difference:
>>>
>>> >> divtest
>>> Mul32 4.300662
>>> Div32 5.638622
>>> Mul64 7.894490
>>> Div64 18.121182
>>>
>>> octave:2> divtest
>>> Mul32 3.960577
>>> Div32 6.553704
>>> Mul64 7.268324
>>> Div64 13.670760
>>>
>>> (files attached)
>>>
>>> Is there something specific about division in numpy that would cause
>>> this slowdown?
>>>
>>
>> Numpy is doing an integer divide unless you are using Python 3.x. The
>> np.true_divide ufunc will speed things up a bit. I'm not sure what
>> Matlab/Octave are doing for division in this case.
>>
>
> For int64:
>
> In [23]: timeit multiply(a, b)
> 100000 loops, best of 3: 3.31 us per loop
>
> In [24]: timeit true_divide(a, b)
> 100000 loops, best of 3: 9.35 us per loop

Thanks for looking into this.  It does look like int64 division is
particularly slow for the systems I'm testing on.  Here's a cython
c-pointer version compared to the numpy version:

Numpy versions as above:

Mul32 3.15036797523
Div32 6.68296504021
Mul64 4.50731801987
Div64 22.9649209976

Cython versions using pointers into contiguous array

Mul32-cy 1.21214485168
Div32-cy 6.75360918045
Mul64-cy 3.98143696785
Div64-cy 31.3645660877

# Timing using double
Multf-cy 4.11406683922
Divf-cy 12.603869915

(code attached).

Matlab certainly returns integers from its int64 division, so I'm not
sure why it does not have such an extreme slowdown for int64 division.

Cheers,

Matthew


More information about the NumPy-Discussion mailing list