[SciPy-User] scipy.stats.kendalltau bug?

Jeffrey zfyuan@mail.ustc.edu...
Sun Jul 29 02:27:30 CDT 2012


On 07/29/2012 01:06 AM, eat wrote:
> Hi,
>
> On Sat, Jul 28, 2012 at 7:48 PM, Jeffrey <zfyuan@mail.ustc.edu.cn 
> <mailto:zfyuan@mail.ustc.edu.cn>> wrote:
>
>     On 07/29/2012 12:23 AM, Jeffrey wrote:
>     > Dear all,
>     >
>     >     The sentences bellow will always raise an Error or Exception
>     just
>     > as follows, which is a little anomaly. Is this a bug?
>     >
>     >     >>> u1=numpy.random.rand(100000)
>     >     >>> u2=numpy.random.rand(100000)
>     >     >>> scipy.stats.kendalltau(u1,u2)
>     >
>     ---------------------------------------------------------------------------
>     >
>     > AttributeError                            Traceback (most recent
>     call
>     > last)
>     >
>     /home/zfyuan/phd/paper1/pyvine_lap/<ipython-input-28-98f367090ed1> in
>     > <module>()
>     > ----> 1 sp.stats.kendalltau(u1,u2)
>     >
>     > /usr/lib64/python2.7/site-packages/scipy/stats/stats.pyc in
>     > kendalltau(x, y, initial_lexsort)
>     >    2673
>     >    2674     tau = ((tot - (v + u - t)) - 2.0 * exchanges) / \
>     > -> 2675                     np.sqrt((tot - u) * (tot - v))
>     >    2676
>     >    2677     # what follows reproduces the ending of Gary Strangman's
>     > original
>     >
>     >
>     > AttributeError: sqrt
>     >
>
>     Sorry, I didn't describe this bug with details. What I mean is
>     that when
>     the two array have larger length, for example with length 100000, then
>     it is more possible that the Error would occur.
>
>     My scipy version is 0.9.0 and numpy is 1.6.2.
>
>     Thanks a lot for your answering.
>
> I can confirm this, like
> In []: os.sys.version
> Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit 
> (Intel)]'
> In []: np.version.version
> Out[]: '1.6.0'
> In []: sp.version.version
> Out[]: '0.9.0'
>
> In []: stats.kendalltau(rand(77929), rand(77929))
> Out[]: (0.0060807135427758865, 0.010891543687108114)
> In []: stats.kendalltau(rand(77939), rand(77939))
> ------------------------------------------------------------
> Traceback (most recent call last):
>   File "<ipython console>", line 1, in <module>
>   File "C:\Python27\lib\site-packages\scipy\stats\stats.py", line 
> 2675, in kendalltau
>     np.sqrt((tot - u) * (tot - v))
> AttributeError: sqrt
>
> There really seems to be odd problem above a certain length of arrays.
>
>
> My 2 cents,
> -eat
>
>
>     --
>
>     Jeffrey
>
>
>     _______________________________________________
>     SciPy-User mailing list
>     SciPy-User@scipy.org <mailto:SciPy-User@scipy.org>
>     http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

Thanks eat. I found the reason is that numpy.sqrt cannot deal with too 
large number. When calculating kendalltau, assume n=len(x),then the 
total pair number is 'tot' below:

     tot=(n-1)*n//2

when calculating tau, the de-numerator is as below:

     np.sqrt((tot-u)*(tot-v))

u and v stands for ties in x[] and y[perm[]], which is zero if the two 
array sample from continuous dist. Hence (tot-u)*(tot-v) may be out of 
range for the C written ufunc 'np.sqrt', and an Error is then raised.

What about using math.sqrt here, or multiply two np.sqrt in the 
de-numerator? Since big data sets are often seen these days.

Thanks a lot !


-- 
Jeffrey

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20120729/ce567470/attachment.html 


More information about the SciPy-User mailing list