[Numpy-discussion] Re: scipy.stats.itemfreq: overflow with add.reduce

Tim Churches tchur at optushome.com.au
Wed Dec 21 12:39:03 CST 2005


Hans Georg Krauthaeuser wrote:
> Hans Georg Krauthaeuser schrieb:
> 
>>Hi All,
>>
>>I was playing with scipy.stats.itemfreq when I observed the following 
>>overflow:
>>
>>In [119]:for i in [254,255,256,257,258]:
>>   .....:    l=[0]*i
>>   .....:    print i, stats.itemfreq(l), l.count(0)
>>   .....:
>>254 [ [  0 254]] 254
>>255 [ [  0 255]] 255
>>256 [ [0 0]] 256
>>257 [ [0 1]] 257
>>258 [ [0 2]] 258
>>
>>itemfreq is pretty small (in stats.py):
>>
>>----------------------------------------------------------------------
>>def itemfreq(a):
>>    """
>>Returns a 2D array of item frequencies.  Column 1 contains item values,
>>column 2 contains their respective counts.  Assumes a 1D array is passed.
>>
>>Returns: a 2D frequency table (col [0:n-1]=scores, col n=frequencies)
>>"""
>>    scores = _support.unique(a)
>>    scores = sort(scores)
>>    freq = zeros(len(scores))
>>    for i in range(len(scores)):
>>        freq[i] = add.reduce(equal(a,scores[i]))
>>    return array(_support.abut(scores, freq))
>>----------------------------------------------------------------------
>>
>>It seems that add.reduce is the source for the overflow:
>>
>>In [116]:from scipy import *
>>
>>In [117]:for i in [254,255,256,257,258]:
>>   .....:    l=[0]*i
>>   .....:    print i, add.reduce(equal(l,0))
>>   .....:
>>254 254
>>255 255
>>256 0
>>257 1
>>258 2
>>
>>Is there any possibility to avoid the overflow?

Apropos the preceding, herewith a thread on the Numpy list from a more
than a few months ago. The take-home message is that for integer arrays,
add.reduce is very fast at producing results which fall into two
categories: a) correct or b) incorrect due to overflow. Unfortunately
there is no equally quick method of determining into which of these two
categories any specific result returned by add.reduce falls.

Tim C



More information about the Numpy-discussion mailing list