# [SciPy-User] Off by one bug in Scipy.stats.hypergeom

Jacob Biesinger jake.biesinger@gmail....
Sat Jul 31 19:48:28 CDT 2010

```Hi!

Perhaps I'm using the module incorrectly, but it looks like the x parameter
in scipy.stats.hypergeom is off by one.  Specifically, I think it's
one-too-high.

>From the wikipedia article
http://en.wikipedia.org/wiki/Hypergeometric_distribution#Application_and_example
(I
know they could be wrong-- just hear me out on this),

scipy.stats.hypergeom?
Hypergeometric distribution

Models drawing objects from a bin.
M is total number of objects, n is total number of Type I objects.
RV counts number of Type I objects in N drawn without replacement
from
population.
So translating wikipedia's example...
Pr(x=4; M=50, n=5, N=10)  = (choose(5,4) * choose(50-5, 10-4)) /
choose(50,10) = .003964583
Pr(x=5; M=50, n=5, N=10)  = (choose(5,5) * choose(50-5, 10-5)) /
choose(50,10) = .0001189375

Which you can check with the python code:
from scipy import comb as chse   # "combination" => choose

float((chse(5,4, exact=1) * chse(50-5,10-4, exact=1))) / chse(50,10,exact=1)
# example one
0.0039645830580150654155  # okay!

float((chse(5,5, exact=1) * chse(50-5,10-5, exact=1))) / chse(50,10,exact=1)
# example two
0.00011893749174045196247  # okay!

Try example one with scipy.stats.hypergeom:
# scipy.stats.hypergeom.sf(x, M, n, N)
scipy.stats.hypergeom.sf(4,50,5,10)
0.00011893749169422652     # correct value for x=5, not x=4
scipy.stats.hypergeom.sf(5,50,5,10)
-4.6185277824406512e-14    # wrong

It seems that changing the loc value from =0 (default) to =1 fixes the
issue...
scipy.stats.hypergeom.sf(4,50,5,10, loc=1)
0.0040835205497095073    # close enough

scipy.stats.hypergeom.sf(5,50,5,10, loc=1)
0.00011893749169422652   # okay!

Am I using the package wrong?
--
Jake Biesinger