[Numpy-discussion] On Numexpr and uint64 type

Timothy Hochberg tim.hochberg@ieee....
Mon Mar 10 15:12:54 CDT 2008


On Mon, Mar 10, 2008 at 11:50 AM, Francesc Altet <faltet@carabos.com> wrote:

> A Monday 10 March 2008, Charles R Harris escrigué:
> > On Mon, Mar 10, 2008 at 11:08 AM, Francesc Altet <faltet@carabos.com>
> wrote:
> > > Hi,
> > >
> > > In order to allow in-kernel queries in PyTables (www.pytables.org)
> > > work with unsigned 64-bit integers, we would like to see uint64
> > > support in Numexpr (http://code.google.com/p/numexpr/).
> > >
> > > To do this, we have to decide first how uint64 interacts with other
> > > types.  For example, which should be the outcome of:
> > >
> > > numpy.array([1], 'int64') / numpy.array([2], 'uint64')
> > >
> > > Basically, there are a couple of possibilities:
> > >
> > > 1) To follow the behaviour of NumPy and upcast both operands to
> > > float64 and do the operation.  That is:
> > >
> > > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64')
> > > Out[21]: array([ 0.5])
> > >
> > > 2) Implement support for uint64 as a non-upcastable type, so that
> > > one cannot merge uint64 operands with other types.  That is:
> > >
> > > In [21]: numpy.array([1], 'int64') / numpy.array([2], 'uint64')
> > > Out[21]: TypeError: unsupported operand type(s) for /: 'int64'
> > > and 'uint64'
> > >
> > > Solution 1) is appealing because is how NumPy works, but I don't
> > > personally like the upcasting to float64.  First of all, because
> > > you transparently convert numbers potentially loosing the least
> > > significant digits.  Second, because an operation between integers
> > > gives a float as a result, and this is different for typical
> > > programming languages.
> >
> > I don't like the up(down)casting either. I suspect the original
> > justification was preserving precision, but it doesn't do that.
> > Addition of signed and unsinged numbers are the same in modular
> > arithmetic, so simply treating everything as uint64 would, IMHO, be
> > the best option there and for multiplication. Not everything has a
> > modular inverse, but truncation is the C solution in that case. The
> > question seems to be whether to return a signed or unsigned integer.
> > Hmm. I would go for unsigned, which could be converted to signed by
> > casting. The sign of the remainder might be a problem, though, which
> > would give unusual truncation behavior.
>
> Mmm, yes.  We've already considered converting all operands to uint64
> first too, and have an uint64 as an outcome too, but realized that we
> could have some difficulties when doing boolean comparisons in Numexpr.
> For example, if a is an int64 and b is uint64, and we want to
> compute "a + b", we could have:
>
> In [44]: a = numpy.array([-4], 'int64')
>
> In [45]: b = numpy.array([2], 'uint64')
>
> In [46]: c = a.astype('uint64') + b.astype('uint64')
>
> In [47]: c
> Out[47]: array([18446744073709551614], dtype=uint64)
>
> In [48]: c.astype('int64')
> Out[48]: array([-2], dtype=int64)   # in case we want signed integers
>
> The difficulty that we observed is that the expression 'a + b < 0' (i.e.
> checking for signedness) could surprise the unexperienced user (this
> would be evaluated as false because the outcome of a + b is unsigned).
> Having said that, this approach is completely consistent and, if
> properly documented, could be a nice way to implement uint64 for
> Numexpr case.
>
> D. Cooke or T. Hochberg have something to say to that regard?


Without a compelling use case, we should try to avoid subtly different
semantics for numexpr and numpy. I'm fine with option #2 since that will
generally result in an unsubtle difference (aka, an exception), but casting
everything to uint64 seems questionable.

Another option, that sounds good to me, at least at first glance, is
implement #2, but expose casting operators from uint64->int64 and
vice-versa. I would spell them as int64 and uint64 since that already works
in numpy. Then one could efficiently perform mixed operations if needed, for
example "a + uint64(b)", but not have the potential pitfalls of automatic
casting.

That's my rapidly depreciating $.02 anyway.

-- 
. __
. |-\
.
. tim.hochberg@ieee.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080310/3d969f00/attachment.html 


More information about the Numpy-discussion mailing list