[SciPy-dev] memoize temporary values and scipy special questions

josef.pktd@gmai... josef.pktd@gmai...
Sat Nov 15 23:06:51 CST 2008


2 observations and 2 questions

observation 1:
gausshyper distribution uses the generic way of generating random variables,
which is pretty slow for larger samples. the _pdf recalculates the same
temporary value each time again.

Cinv is the normalization constant for the pdf, which depends on the
parametersn(a, b, c, z) but not on the sample point x

class gausshyper_gen(rv_continuous):
    def _pdf(self, x, a, b, c, z):
        Cinv = gam(a)*gam(b)/gam(a+b)*special.hyp2f1(c,a,a+b,-z)
        return 1.0/Cinv * x**(a-1.0) * (1.0-x)**(b-1.0) / (1.0+z*x)**c

Note: gam = special.gamma

observation 2: special.hyp2f1 seems to have problems over some parameter range


question 1:
When calculating the pdf for different x but same parameters,
then the normalization constant could be temporarily stored somewhere
as in memoizing or as a lazy attribute. Is there a precedence and
pattern for this in numpy/scipy, or do I try whatever might work in the
individual case, e.g. create a cache for repeated calculation of the same
values?

question 2:
In the above case, the normalization constant could also be
obtained through direct numerical integration (over x in interval [0,1]).
What are the relative merits of using scipy.special versus numerical
integration, in terms of speed, accuracy, and range for which correct
answers are produced?

These are just some ideas I had, while I playing around with gausshyper
to see why it doesn't work very well (it is slow and convergence of fit
method to true parameters is not very good).
I haven't profiled anything and rvs is additionally slow because
the generic way of generating random variables when only
the pdf is given is pretty roundabout: pdf->cdf->ppf->rvs

If somebody can provide some information on this, then I don't have
to figure it out myself.

Thanks,
Josef


More information about the Scipy-dev mailing list