[Numpy-discussion] numpy type mismatch
Charles R Harris
charlesr.harris@gmail....
Fri Jun 10 22:56:04 CDT 2011
On Fri, Jun 10, 2011 at 9:10 PM, Benjamin Root <ben.root@ou.edu> wrote:
>
>
> On Fri, Jun 10, 2011 at 9:29 PM, Olivier Delalleau <shish@keba.be> wrote:
>
>>
>> 2011/6/10 Olivier Delalleau <shish@keba.be>
>>
>>> 2011/6/10 Charles R Harris <charlesr.harris@gmail.com>
>>>
>>>>
>>>>
>>>> On Fri, Jun 10, 2011 at 5:19 PM, Olivier Delalleau <shish@keba.be>wrote:
>>>>
>>>>> 2011/6/10 Charles R Harris <charlesr.harris@gmail.com>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 10, 2011 at 3:43 PM, Benjamin Root <ben.root@ou.edu>wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 10, 2011 at 3:24 PM, Charles R Harris <
>>>>>>> charlesr.harris@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 10, 2011 at 2:17 PM, Benjamin Root <ben.root@ou.edu>wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 10, 2011 at 3:02 PM, Charles R Harris <
>>>>>>>>> charlesr.harris@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 10, 2011 at 1:50 PM, Benjamin Root <ben.root@ou.edu>wrote:
>>>>>>>>>>
>>>>>>>>>>> Came across an odd error while using numpy master. Note, my
>>>>>>>>>>> system is 32-bits.
>>>>>>>>>>>
>>>>>>>>>>> >>> import numpy as np
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int32)) == np.int32
>>>>>>>>>>> False
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int64)) == np.int64
>>>>>>>>>>> True
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.float32)) == np.float32
>>>>>>>>>>> True
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.float64)) == np.float64
>>>>>>>>>>> True
>>>>>>>>>>>
>>>>>>>>>>> So, only the summation performed with a np.int32 accumulator
>>>>>>>>>>> results in a type that doesn't match the expected type. Now, for even more
>>>>>>>>>>> strangeness:
>>>>>>>>>>>
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int32))
>>>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>>>> >>> hex(id(type(np.sum([1, 2, 3], dtype=np.int32))))
>>>>>>>>>>> '0x9599a0'
>>>>>>>>>>> >>> hex(id(np.int32))
>>>>>>>>>>> '0x959a80'
>>>>>>>>>>>
>>>>>>>>>>> So, the type from the sum() reports itself as a numpy int, but
>>>>>>>>>>> its memory address is different from the memory address for np.int32.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> One of them is probably a long, print out the typecode,
>>>>>>>>>> dtype.char.
>>>>>>>>>>
>>>>>>>>>> Chuck
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Good intuition, but odd result...
>>>>>>>>>
>>>>>>>>> >>> import numpy as np
>>>>>>>>> >>> a = np.sum([1, 2, 3], dtype=np.int32)
>>>>>>>>> >>> b = np.int32(6)
>>>>>>>>> >>> type(a)
>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>> >>> type(b)
>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>> >>> a.dtype.char
>>>>>>>>> 'i'
>>>>>>>>> >>> b.dtype.char
>>>>>>>>> 'l'
>>>>>>>>>
>>>>>>>>> So, the standard np.int32 is getting listed as a long somehow? To
>>>>>>>>> further investigate:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes, long shifts around from int32 to int64 depending on the OS. For
>>>>>>>> instance, in 64 bit Windows it's 32 bits while in 64 bit Linux it's 64 bits.
>>>>>>>> On 32 bit systems it is 32 bits.
>>>>>>>>
>>>>>>>> Chuck
>>>>>>>>
>>>>>>>>
>>>>>>> Right, that makes sense. But, the question is why does sum() put out
>>>>>>> a result dtype that is not identical to the dtype that I requested, or even
>>>>>>> the dtype of the input array? Could this be an indication of a bug
>>>>>>> somewhere? Even if the bug is harmless (it was only noticed within the test
>>>>>>> suite of larry), is this unexpected?
>>>>>>>
>>>>>>>
>>>>>> I expect sum is using a ufunc and it acts differently on account of
>>>>>> the cleanup of the ufunc casting rules. And yes, a long *is* int32 on your
>>>>>> machine. On mine
>>>>>>
>>>>>> In [4]: dtype('q') # long long
>>>>>> Out[4]: dtype('int64')
>>>>>>
>>>>>> In [5]: dtype('l') # long
>>>>>> Out[5]: dtype('int64')
>>>>>>
>>>>>> The mapping from C types to numpy width types isn't 1-1. Personally, I
>>>>>> think we should drop long ;) But it used to be the standard Python type in
>>>>>> the C API. Mark has also pointed out the problems/confusion this ambiguity
>>>>>> causes and someday we should probably think it out and fix it. But I don't
>>>>>> think it is the most pressing problem.
>>>>>>
>>>>>> Chuck
>>>>>>
>>>>>>
>>>>> But isn't it a bug if numpy.dtype('i') != numpy.dtype('l') on a 32 bit
>>>>> computer where both are int32?
>>>>>
>>>>>
>>>> Maybe yes, maybe no ;) They have different descriptors, so from numpy's
>>>> perspective they are different, but at the hardware/precision level they are
>>>> the same. It's more of a decision as to what != means in this case. Since
>>>> numpy started as Numeric with only the c types the current behavior is
>>>> consistent, but that doesn't mean it shouldn't change at some point.
>>>>
>>>> Chuck
>>>>
>>>
>>> Well apparently it was actually changed recently, since in Numpy 1.5.1 on
>>> a Windows 32 bit machine, they are considered equal with '=='.
>>> Personally I think if the string representation of two dtypes is "int32",
>>> then they should be ==, otherwise it wouldn't make much sense given that you
>>> can directly test the equality of a dtype with a string like "int32" (like
>>> dtype('i') == "int32" and dtype('l') == "int32").
>>>
>>
>> I also just checked on a fresh install of numpy 1.6.0 on python 3.2, and
>> both types are equal as well.
>>
>
> Are you talking about the release of 1.6, or the continued development
> branch? This is happening to me on the master branch, but I have not tried
> earlier versions. Again, I think this bolsters the evidence that this is
> from a (very) recent change.
>
>
>> I've been playing quite a bit with numpy dtypes and it's the first time I
>> hear two dtypes representing the exact same kind of data do not compare
>> equal, so I'm still enclined to believe it should be considered a bug.
>>
>>
> Quite honestly, I really don't care that the dtypes aren't equal. I
> usually work at a purely python level and performing actions based on types
> is generally bad practice anyway. Anytime that I (rarely) check types, I
> would use isinstance() against one of the core numerical types rather than a
> numpy type. The fact that I even found this issue was completely by
> accident while investigating a test failure in larry.
>
> What concerns me more is that the type coming from the ufunc is not the
> same type that went in, or even requested through the dtype argument. I
> think *that* should be the main concern here, and should probably be tested
> for in the unit tests.
>
>
To be a bit more explicit:
In [3]: np.sum([1, 2, 3], dtype='q').dtype.char
Out[3]: 'l'
In [4]: np.sum([1, 2, 3], dtype='l').dtype.char
Out[4]: 'l'
Note that there were previous oddities, for instance the returned type for a
+ b would not necessarily be the same as for b + a, even though the
precisions would be the same.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110610/46c4a171/attachment.html
More information about the NumPy-Discussion
mailing list