[Numpy-discussion] NaN as dictionary key?

Bruce Southey bsouthey@gmail....
Thu Apr 23 09:38:10 CDT 2009


josef.pktd@gmail.com wrote:
> 2009/4/20 Wes McKinney <wesmckinn@gmail.com>:
>   
>> I assume that, because NaN != NaN, even though both have the same hash value
>> (hash(NaN) == -32768), that Python treats any NaN double as a distinct key
>> in a dictionary.
>>
>> In [76]: a = np.repeat(nan, 10)
>>
>> In [77]: d = {}
>>
>> In [78]: for i, v in enumerate(a):
>>    ....:     d[v] = i
>>    ....:
>>    ....:
>>
>> In [79]: d
>> Out[79]:
>> {nan: 0,
>>  nan: 1,
>>  nan: 6,
>>  nan: 4,
>>  nan: 3,
>>  nan: 9,
>>  nan: 7,
>>  nan: 2,
>>  nan: 8,
>>  nan: 5}
>>
>> I'm not sure if this ever worked in a past version of NumPy, however, I have
>> code which does a "group by value" and occasionally in the real world those
>> values are NaN. Any ideas or a way around this problem?
>>     
>
> For non hashable keys, I convert them to string, eg with repr or str
> or some other string representation for floating point.
>
> I use it for example to feed it to unique1d.
>
> Josef
>
>
>   
>>>> a
>>>>         
> array([ NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN])
>   
>>>> np.unique1d(a)
>>>>         
> array([ NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN])
>
> using type string is not good with nan (automatic conversion of nans in casting)
>   
>>>> np.unique1d(a.astype(str))
>>>>         
> array(['1'],
>       dtype='|S1')
>   
>>>> a.astype(str)
>>>>         
> array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
>       dtype='|S1')
>
>   
>>>> np.unique1d([repr(ii) for ii in a])
>>>>         
> array(['nan'],
>       dtype='|S3')
>
>
> but nans don't round trip, is this intended (at least not on windows
>
>   
>>>> np.unique1d(np.arange(10).astype(str)).astype(float)
>>>>         
> array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
>   
>>>> np.all(np.array([repr(ii) for ii in np.pi*np.arange(10)]).astype(float) == np.pi*np.arange(10))
>>>>         
> True
>
>   
>>>> np.unique1d([repr(ii) for ii in a]).astype(float)
>>>>         
> Traceback (most recent call last):
>   File "<pyshell#120>", line 1, in <module>
> ValueError: invalid literal for float(): nan
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   
Hi,
Perhaps you want something use isfinite and friends such as:

import numpy as np
a = np.array([1,2,3, np.inf, np.nan, 10])
e = {}
for i, v in enumerate(a):
    if np.isfinite(v):
        e[v] = i
    else:
        e[repr(v)]=i

You probably should use isfinite outside of the loop.

If you really do not care about NaN and infinity, then you could use a 
masked array where NaN and infinity are masked.

Bruce



More information about the Numpy-discussion mailing list