[SciPy-User] scipy.stats.kendalltau bug?
Sun Jul 29 05:30:51 CDT 2012
On Sun, Jul 29, 2012 at 10:42 AM, Jeffrey <email@example.com> wrote:
> On 07/29/2012 03:47 PM, Nathaniel Smith wrote:
>> On Sun, Jul 29, 2012 at 8:27 AM, Jeffrey <firstname.lastname@example.org> wrote:
>>> Thanks eat. I found the reason is that numpy.sqrt cannot deal with too large
>>> number. When calculating kendalltau, assume n=len(x),then the total pair
>>> number is 'tot' below:
>>> when calculating tau, the de-numerator is as below:
>>> u and v stands for ties in x and y[perm], which is zero if the two array
>>> sample from continuous dist. Hence (tot-u)*(tot-v) may be out of range for
>>> the C written ufunc 'np.sqrt', and an Error is then raised.
>>> What about using math.sqrt here, or multiply two np.sqrt in the
>>> de-numerator? Since big data sets are often seen these days.
>> It seems like the bug is that np.sqrt is raising an AttributeError on
>> valid input... can you give an example of a value that np.sqrt fails
>> on? Like
> Assume the input array x and y has n=100000 length, which is common
> seen, and assume there is no tie in both x and y, hence u=0, v=0 and t=0
> in the scipy.stats.kendalltau subroutine. Hence the de-numerator of
> expression for calculating tau would be as follows:
> np.sqrt( (tot-u) * (tot-v) )
> Here above, tot= n * (n-1) //2=499950000, and (tot-u) * (tot-v)= tot*tot
> = 24999500002500000000L, this long int will raise Error when np.sqrt is
> applied. I think type convert, like 'float()' should be done before
> np.sqrt, or write like np.sqrt(tot-u) * np.sqrt(tot-v) to avoid long
> Thanks a lot : )
Thanks, that clarifies things: https://github.com/numpy/numpy/issues/368
For now, yeah, some sort of workaround makes sense, though... in
addition to the ones you mention, I noticed that this also seems to
You should submit a pull request :-).
More information about the SciPy-User