[Numpy-discussion] Making cdecimal.Decimal a native numpy type

Dr.Leo fhaxbox66@googlemail....
Sun Jul 22 07:54:36 CDT 2012


I am a seasoned numpy/pandas user mainly interested in financial 
applications. These and other applications would greatly benefit from a 
decimal data type with flexible rounding rules, precision etc.

Yes, there is cdecimal, the traditional decimal module from the Python 
stdlib rewritten in C,

- http://www.bytereef.org/mpdecimal/index.html -

which has become part of the stdlib from Python 3.3.

However, it appears that cdecimal cannot be meaningfully used with numpy 
(see the benchmark below). Squaring an n=10000 ndarray is 1500 times 
faster with float64 than with a dtype=object ndarray based on 
cdecimal.Decimal, and even simple operations fail in the first place.

I am not deeply enough into ufuncs etc. to judge if some of these 
problems can be avoided with a few lines of Python code. However, my 
impression is that ultimately we would all benefit from cdecimal.Decimal 
becoming a native numpy type. Put bluntly, cdecimal is a great tool. But 
it is not yet where we most need it.

The author of cdecimal, Stefan Krah, would probably have a great deal of 
the skillset needed to successfully take such a project forward. He 
happens to have also written the new memoryview implementation of Python 
3.3. And from recent correspondence I understand he might be willing to 
get involved in an effort to marry numpy and cdecimal.

The main question is if such project would fit into what core developers 
see as the future of numpy.



And here is the benchmark:

In [1]: from numpy import *

In [2]: from cdecimal import Decimal

In [3]: r=random.rand(10000)

In [4]: d=ndarray(10000, dtype=Decimal)

In [5]: d.dtype
Out[5]: dtype('object')

In [6]: r.dtype
Out[6]: dtype('float64')

In [7]: for i in range(10000): d[i] = Decimal(r[i])

In [8]: %timeit r**2
100000 loops, best of 3: 14.7 us per loop

In [9]: %timeit d**2
10 loops, best of 3: 21.2 ms per loop

In [10]: r.var()
Out[10]: 0.082478142261349557

In [11]: d.var()
TypeError                                 Traceback (most recent call last)
-11-bf09d28e33ab> in <module>()
----> 1 d.var()

More information about the NumPy-Discussion mailing list