[SciPy-User] Off by one bug in Scipy.stats.hypergeom

Jacob Biesinger jake.biesinger@gmail....
Sat Jul 31 19:48:28 CDT 2010


Perhaps I'm using the module incorrectly, but it looks like the x parameter
in scipy.stats.hypergeom is off by one.  Specifically, I think it's

>From the wikipedia article
know they could be wrong-- just hear me out on this),

Hypergeometric distribution

       Models drawing objects from a bin.
       M is total number of objects, n is total number of Type I objects.
       RV counts number of Type I objects in N drawn without replacement
So translating wikipedia's example...
Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
choose(50,10) = .003964583
Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
choose(50,10) = .0001189375

Which you can check with the python code:
from scipy import comb as chse   # "combination" => choose

float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)
 # example one
0.0039645830580150654155  # okay!

float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1)
# example two
0.00011893749174045196247  # okay!

Try example one with scipy.stats.hypergeom:
# scipy.stats.hypergeom.sf(x, M, n, N)
0.00011893749169422652     # correct value for x=5, not x=4
-4.6185277824406512e-14    # wrong

It seems that changing the loc value from =0 (default) to =1 fixes the
scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073    # close enough

scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652   # okay!

Am I using the package wrong?
Jake Biesinger
Graduate Student
Xie Lab, UC Irvine
(949) 231-7587
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20100731/6f2c8c05/attachment.html 

More information about the SciPy-User mailing list