[Numpy-discussion] Alternative to R phyper
Mon Apr 30 08:03:41 CDT 2012
On Mon, Apr 30, 2012 at 7:27 AM, Robert Kern <email@example.com> wrote:
> On Mon, Apr 30, 2012 at 12:09, Bruno Santos <firstname.lastname@example.org> wrote:
>> Hello everyone,
>> I have a bit of code where I am using rpy2 to import R phyper so I can
>> perform an hypergeometric test. Unfortunately our cluster does not have a
>> functional installation of rpy2 working. So I am wondering if I could
>> translate to scipy which would make the code completly independent of R. The
>> python code I have is as following:
>> def lphyper(self,q,m,n,k):
>> Calculate p-value using R function phyper from rpy2 low-level
>> "R Documentation
>> phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)
>> q: vector of quantiles representing the number of white balls
>> drawn without replacement from an urn which contains both
>> black and white balls.
>> m: the number of white balls in the urn.
>> n: the number of black balls in the urn.
>> k: the number of balls drawn from the urn.
>> log.p: logical; if TRUE, probabilities p are given as log(p).
>> lower.tail: logical; if TRUE (default), probabilities are P[X <= x],
>> otherwise, P[X > x].
>> phyper_q = SexpVector([q,], rinterface.INTSXP)
>> phyper_m = SexpVector([m,], rinterface.INTSXP)
>> phyper_n = SexpVector([n,], rinterface.INTSXP)
>> phyper_k = SexpVector([k,], rinterface.INTSXP)
>> return phyper(phyper_q,phyper_m,phyper_n,phyper_k,**myparams)
>> I have looked to scipy.stats.hypergeom but it is giving me a different
>> result which is also negative.
>>> 1-phyper(45, 92, 7518, 1329)
>>  6.92113e-13
>> In : stats.hypergeom.sf(45,(92+7518),92,1329)
>> Out: -8.4343643180773142e-12
the corrected version with explicit sf was added in 0.10
>>> from scipy import stats
>>> import scipy
> The CDF (CMF? whatever) for stats.hypergeom is not implemented
> explicitly, so it falls back to the default implementation of just
> summing up the PMF. You are falling victim to floating point error
> accumulation such that the sum exceeds 1.0, so the survival function
> 1-CDF ends up negative. I don't think we have a better implementation
> of the CDF anywhere in scipy, sorry.
> Robert Kern
> NumPy-Discussion mailing list
More information about the NumPy-Discussion