[Numpy-discussion] Alternative to R phyper

josef.pktd@gmai... josef.pktd@gmai...
Mon Apr 30 08:03:41 CDT 2012


On Mon, Apr 30, 2012 at 7:27 AM, Robert Kern <robert.kern@gmail.com> wrote:
> On Mon, Apr 30, 2012 at 12:09, Bruno Santos <bacmsantos@gmail.com> wrote:
>> Hello everyone,
>>
>> I have a bit of code where I am using rpy2 to import R phyper so I can
>> perform an hypergeometric test. Unfortunately our cluster does not have a
>> functional installation of rpy2 working. So I am wondering if I could
>> translate to scipy which would make the code completly independent of R. The
>> python code I have is as following:
>>
>>
>> def lphyper(self,q,m,n,k):
>>         """
>>         self.phyper(self,q,m,n,k)
>>         Calculate p-value using R function phyper from rpy2 low-level
>>         interface.
>>         "R Documentation
>>         phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
>>         q: vector of quantiles representing the number of white balls
>>             drawn without replacement from an urn which contains both
>>             black and white balls.
>>         m: the number of white balls in the urn.
>>         n: the number of black balls in the urn.
>>         k: the number of balls drawn from the urn.
>>         log.p: logical; if TRUE, probabilities p are given as log(p).
>>         lower.tail: logical; if TRUE (default), probabilities are P[X <= x],
>>             otherwise, P[X > x].
>>         """
>>         phyper_q = SexpVector([q,], rinterface.INTSXP)
>>         phyper_m = SexpVector([m,], rinterface.INTSXP)
>>         phyper_n = SexpVector([n,], rinterface.INTSXP)
>>         phyper_k = SexpVector([k,], rinterface.INTSXP)
>>         return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0]
>>
>> I have looked to scipy.stats.hypergeom but it is giving me a different
>> result which is also negative.
>>> 1-phyper(45, 92, 7518, 1329)
>> [1] 6.92113e-13
>>
>> In [24]: stats.hypergeom.sf(45,(92+7518),92,1329)
>> Out[24]: -8.4343643180773142e-12

the corrected version with explicit sf was added in 0.10

>>> from scipy import stats
>>> stats.hypergeom.sf(45,(92+7518),92,1329)
6.9212632647221852e-13

>>> import scipy
>>> scipy.__version__
'0.10.0b2'

Josef

>
> The CDF (CMF? whatever) for stats.hypergeom is not implemented
> explicitly, so it falls back to the default implementation of just
> summing up the PMF. You are falling victim to floating point error
> accumulation such that the sum exceeds 1.0, so the survival function
> 1-CDF ends up negative. I don't think we have a better implementation
> of the CDF anywhere in scipy, sorry.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


More information about the NumPy-Discussion mailing list