[SciPy-dev] Question about 64-bit integers being cast to double precision

Fernando Perez Fernando.Perez at colorado.edu
Wed Oct 12 17:33:11 CDT 2005

```Travis Oliphant wrote:

>>With all that, my vote on Travis's specific question:  if conversion of
>>an N-bit integer in scipy_core is required, it gets converted to an
>>N-bit float.  The only cases in which precision will be lost is if the
>>integer is large enough to require more than (N-e) bits for its
>>representation, where e is the number of bits in the exponent of the
>>floating point representation.
>>
>
>
> Yes, it is only for large integers that problems arise.   I like this
> scheme and it would be very easy to implement, and it would provide a
> consistent interface.
>
> The only problem is that it would mean that on current 32-bit systems
>
> sqrt(2)  would cast 2 to a "single-precision" float and return a
> single-precision result.
>
> If that is not a problem, then great...
>
> Otherwise, a more complicated (and less consistent) rule like
>
> integer             float
> ==============
> 8-bit              32-bit
> 16-bit            32-bit
> 32-bit            64-bit
> 64-bit            64-bit
>
> would be needed (this is also not too hard to do).

Here's a different way to think about this issue: instead of thinking in terms
of bit-width, let's look at it in terms of exact vs inexact numbers.  Integers
are exact, and their bit size only impacts the range of them which is
representable.

If we look at it this way, then seems to me justifiable to suggest that
sqrt(2) would upcast to the highest-available precision floating point format.
Obviously this can have an enormous memory impact if we're talking about a
big array of numbers instead of sqrt(2), so I'm not 100% sure it's the right
solution.  However, I think that the rule 'if you apply "floating point"
operations to integer inputs, the system will upcast the integers to give you
as much precision as possible' is a reasonable one.  Users needing tight
memory control could always first convert their small integers to the smallest
existing floats, and then operate on that.

Just my 1e-2

Cheers,

f

```