[Scipy-tickets] [SciPy] #1871: Large ties count in data leads to int32 overflow in wilcoxon test

SciPy Trac scipy-tickets@scipy....
Fri Mar 22 10:55:04 CDT 2013


#1871: Large ties count in data leads to int32 overflow in wilcoxon test
-------------------------+--------------------------------------------------
 Reporter:  simonb       |       Owner:  rgommers   
     Type:  defect       |      Status:  new        
 Priority:  normal       |   Milestone:  Unscheduled
Component:  scipy.stats  |     Version:  0.9.0      
 Keywords:  wilcoxon     |  
-------------------------+--------------------------------------------------
 import numpy
 import scipy.stats
 numpy.seterr(all='raise')
 # Raises: FloatingPointError: overflow encountered in int_scalars
 scipy.stats.wilcoxon([0.1] * 46341)  # 46341^2 > 2^31-1

 Analysis:

 scipy.stats.wilcoxon() calls find_repeats() to count occurrences of
 identical data.  find_repeats() is a call to futil.dfreps().  dfreps() is
 a python plugin, from scipy/stats/futilmodule.c.  futilmodule.c uses and
 returns 32-bit C ints.  The data type for the counts returned by
 find_repeats() is therefore numpy.int32.

 When calculating "0.5*si*(si*si-1.0)", 'si' is a value from
 find_repeats(), and has int32 type.  The calculation overflows when si*si
 > 2^31-1, that is si >= 46341.

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1871>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list