[Numpy-discussion] Alternative to R phyper
josef.pktd@gmai...
josef.pktd@gmai...
Mon Apr 30 08:03:41 CDT 2012
On Mon, Apr 30, 2012 at 7:27 AM, Robert Kern <robert.kern@gmail.com> wrote:
> On Mon, Apr 30, 2012 at 12:09, Bruno Santos <bacmsantos@gmail.com> wrote:
>> Hello everyone,
>>
>> I have a bit of code where I am using rpy2 to import R phyper so I can
>> perform an hypergeometric test. Unfortunately our cluster does not have a
>> functional installation of rpy2 working. So I am wondering if I could
>> translate to scipy which would make the code completly independent of R. The
>> python code I have is as following:
>>
>>
>> def lphyper(self,q,m,n,k):
>> """
>> self.phyper(self,q,m,n,k)
>> Calculate p-value using R function phyper from rpy2 low-level
>> interface.
>> "R Documentation
>> phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
>> q: vector of quantiles representing the number of white balls
>> drawn without replacement from an urn which contains both
>> black and white balls.
>> m: the number of white balls in the urn.
>> n: the number of black balls in the urn.
>> k: the number of balls drawn from the urn.
>> log.p: logical; if TRUE, probabilities p are given as log(p).
>> lower.tail: logical; if TRUE (default), probabilities are P[X <= x],
>> otherwise, P[X > x].
>> """
>> phyper_q = SexpVector([q,], rinterface.INTSXP)
>> phyper_m = SexpVector([m,], rinterface.INTSXP)
>> phyper_n = SexpVector([n,], rinterface.INTSXP)
>> phyper_k = SexpVector([k,], rinterface.INTSXP)
>> return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)[0]
>>
>> I have looked to scipy.stats.hypergeom but it is giving me a different
>> result which is also negative.
>>> 1-phyper(45, 92, 7518, 1329)
>> [1] 6.92113e-13
>>
>> In [24]: stats.hypergeom.sf(45,(92+7518),92,1329)
>> Out[24]: -8.4343643180773142e-12
the corrected version with explicit sf was added in 0.10
>>> from scipy import stats
>>> stats.hypergeom.sf(45,(92+7518),92,1329)
6.9212632647221852e-13
>>> import scipy
>>> scipy.__version__
'0.10.0b2'
Josef
>
> The CDF (CMF? whatever) for stats.hypergeom is not implemented
> explicitly, so it falls back to the default implementation of just
> summing up the PMF. You are falling victim to floating point error
> accumulation such that the sum exceeds 1.0, so the survival function
> 1-CDF ends up negative. I don't think we have a better implementation
> of the CDF anywhere in scipy, sorry.
>
> --
> Robert Kern
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list