[Scipy-tickets] [SciPy] #1489: fisher_exact throws ValueErrors when row or col is 0, 0

SciPy Trac scipy-tickets@scipy....
Sun Aug 7 16:37:44 CDT 2011


#1489: fisher_exact throws ValueErrors when row or col is 0,0
--------------------------+-------------------------------------------------
 Reporter:  aihardin      |       Owner:  somebody   
     Type:  defect        |      Status:  new        
 Priority:  normal        |   Milestone:  Unscheduled
Component:  scipy.stats   |     Version:  0.9.0      
 Keywords:  fisher_exact  |  
--------------------------+-------------------------------------------------

Comment(by josefpktd):

 I don't think changing the numbers to make them non-zero is good, it would
 return a answer to a different table.

 R reports pvalue=1 oddsratio=0, when there is a row or column of zeros on
 2x2
 {{{
 > fisher.test(x, alternative = "tw")

         Fisher's Exact Test for Count Data

 data:  x
 p-value = 1
 alternative hypothesis: true odds ratio is not equal to 1
 95 percent confidence interval:
    0 Inf
 sample estimates:
 odds ratio
          0
 }}}

 this doesn't make much sense to me, there seems to be a conflict by saying
 the p-value is 1, but the confidence interval says, the odds ratio could
 be anything.

 I still don't have enough intuition about the Fisher exact test for this.
 I would raise a ValueError.

 However, this is a bit similar to the discussion on what to return in the
 ttest when the variances are zero, there I decided to return a partially
 arbitrary value to avoid nans, 0/0=?. But I don't know what the p-value
 should be.

 The problem I have with this is that Fisher's exact test is conditional on
 the marginals, which in this case conditions on an empty set. Which in my
 interpretations would mean that we don't have any information in our
 sample and we should raise a ValueError.
 If this were an unconditional test, then zero rows or columns would be
 strong indication for independence (we always get the same value,
 independent of what the other variable is) and the p-value should be large
 (or 1).

 I vote for ValueError, but if users of fisher_exact have a strong opinion
 about a default pvalue for the zero row or column case, it would also be
 fine with me.

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1489#comment:4>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list