[SciPy-User] the skellam distribution
Mon Nov 9 13:36:29 CST 2009
2009/11/9 Ernest Adrogué <firstname.lastname@example.org>:
> 9/11/09 @ 09:40 (-0500), thus spake email@example.com:
>> Thanks, I think the distribution of the difference of two poisson
>> distributed random variables could be useful.
>> Would you please open an enhancement ticket for this at
> Okay, I will.
>> I had only a brief look at it so far, I had never looked at the
>> Skellam distribution before, and just read a few references.
>> The "if x < 0 .. else ..." will have to be replace with a
>> "numpy.where" assignment, since the methods are supposed to work with
>> arrays of x (as far as I remember)
>> _rvs could be implemented directly instead of generically (I don't
>> find the reference, where I saw it, right now).
> I suppose it could be done, but I can't figure out how.
> As far as I understand, you can't derive the poisson parameters
> from the skellam parameters alone, you need the correlation
> coefficient (between the two poisson rv) too, is that right?
it's can be generated as difference between two independent
poisson, (see R docs). Correlation only matters for the
interpretation, as you explain below and I found in a reference.
>> Documentation will be necessary, a brief description in the
>> (currently) extradocs, and a listing of the properties for the
>> description of the distributions currently in the stats tutorial.
>> I have some background questions, which address the limitation of the
>> implementation (but are not really necessary for inclusion into
>> The description in R mentions several implementation of Skellam. Do
>> you have a rough idea what the range of parameters are for which the
>> implementation using ncx produces good results? Do you know if any
>> other special functions would produce good results over a larger
>> range, e.g. using Bessel function?
> No, sorry, I have no idea. The R paper says that in R the Bessel
> function is more accurate, but I don't think this means that the
> Bessel function implementation is more accurate in general.
> I compared results using the Bessel function in scipy and using
> the ncx2 implementation, and the differences were minimal, although
> I admit my testing wasn't extensive. Another problem is that
> I haven't got a table of values for this particular distribution,
> so in case results differ I can't tell which one is more accurate
I will look at it, but small numerical inaccuracies, 1e-??, can also be
fixed later, if someone finds a better implementation.,
>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>> mentions (but doesn't describe) the case of Skellam distribution with
>> correlated Poisson distributions. Do you know what the difference to
>> your implementation would be?
> As far as I know, it makes no difference if the two Poisson
> variates are correlated or not. The Skellam parameters are
> defined as
> mu1 = lam1 - rho * sqrt(lam1*lam2)
> mu2 = lam2 - rho * sqrt(lam1*lam2)
> where lam1 and lam2 are the Poisson means and rho is the
> correlation coefficient. So, in case there is correlation it
> is implicit in the parameters mu1 and mu2, and it doesn't
> make any difference in terms of calculating the pmf or cdf.
> Again, I am no statician or mathematician.
I agree, that's what I have seen in the references.
>> Tests for a new distribution will be picked up by the generic tests,
>> but it would be useful to have some extra tests for extreme/uncommon
>> parameter ranges. Do you have any comparisons with R, since you
>> already looked it?
> I have this visual test, although it's usefulness is
> questionable. It shows the deviation between observed and
> expected frequencies for a Skellam random variable. Ideally
> errors should be random and centered around 0. To me
> it looks like errors tend to increase around the mean
> and are smaller along the tails, but I don't know if this
> means something or not.
> import numpy
> import scipy.stats.distributions
> poisson = scipy.stats.distributions.poisson
> ncx2 = scipy.stats.distributions.ncx2
> # Skellam distribution
> class skellam_gen(scipy.stats.distributions.rv_discrete):
> def _pmf(self, x, mu1, mu2):
> px = numpy.where(x < 0, ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2,
> ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2)
> return px
> def _cdf(self, x, mu1, mu2):
> x = numpy.floor(x)
> px = numpy.where(x < 0, ncx2.cdf(2*mu2, -2*x, 2*mu1),
> 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2))
> return px
> def _stats(self, mu1, mu2):
> mean = mu1 - mu2
> var = mu1 + mu2
> g1 = (mean) / numpy.sqrt((var)**3)
> g2 = 1 / var
> return mean, var, g1, g2
> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
> shapes="mu1,mu2", extradoc="")
> if __name__ == '__main__':
> lam1 = 3.4
> lam2 = 6.1
> n = 5000
> poisson_var1 = numpy.random.poisson(lam1, n)
> poisson_var2 = numpy.random.poisson(lam2, n)
> skellam_var = poisson_var1-poisson_var2
> low = min(skellam_var)
> high = max(skellam_var)
> obs_freq = numpy.histogram(
> skellam_var, numpy.arange(low, high+2)) / float(n)
> rho = numpy.corrcoef(poisson_var1, poisson_var2)[1,0]
> mu1 = lam1 - rho * numpy.sqrt(lam1*lam2)
> mu2 = lam2 - rho * numpy.sqrt(lam1*lam2)
> exp_freq = skellam.pmf(numpy.arange(low, high+1), mu1, mu2)
> print obs_freq-exp_freq
> # plot
> import matplotlib.pyplot as plt
I'm not sure the example is correct. You are simulating two
independent poisson variables, so the difference skellam_var should be
distributed as skellam with
mu1 = lam1
mu2 = lam2
and theoretical rho should be zero. Or am I missing something?
thanks, the numpy.where looks good, I still have to actually run the examples.
> SciPy-User mailing list
More information about the SciPy-User