[SciPy-User] the skellam distribution
josef.pktd@gmai...
josef.pktd@gmai...
Mon Nov 9 13:36:29 CST 2009
2009/11/9 Ernest Adrogué <eadrogue@gmx.net>:
> 9/11/09 @ 09:40 (-0500), thus spake josef.pktd@gmail.com:
>> Thanks, I think the distribution of the difference of two poisson
>> distributed random variables could be useful.
>>
>> Would you please open an enhancement ticket for this at
>> http://projects.scipy.org/scipy/report/1
>
> Okay, I will.
>
>> I had only a brief look at it so far, I had never looked at the
>> Skellam distribution before, and just read a few references.
>>
>> The "if x < 0 .. else ..." will have to be replace with a
>> "numpy.where" assignment, since the methods are supposed to work with
>> arrays of x (as far as I remember)
>
> Done.
>
>> _rvs could be implemented directly instead of generically (I don't
>> find the reference, where I saw it, right now).
>
> I suppose it could be done, but I can't figure out how.
> As far as I understand, you can't derive the poisson parameters
> from the skellam parameters alone, you need the correlation
> coefficient (between the two poisson rv) too, is that right?
it's can be generated as difference between two independent
poisson, (see R docs). Correlation only matters for the
interpretation, as you explain below and I found in a reference.
>
>> Documentation will be necessary, a brief description in the
>> (currently) extradocs, and a listing of the properties for the
>> description of the distributions currently in the stats tutorial.
>>
>> I have some background questions, which address the limitation of the
>> implementation (but are not really necessary for inclusion into
>> scipy).
>>
>> The description in R mentions several implementation of Skellam. Do
>> you have a rough idea what the range of parameters are for which the
>> implementation using ncx produces good results? Do you know if any
>> other special functions would produce good results over a larger
>> range, e.g. using Bessel function?
>
> No, sorry, I have no idea. The R paper says that in R the Bessel
> function is more accurate, but I don't think this means that the
> Bessel function implementation is more accurate in general.
> I compared results using the Bessel function in scipy and using
> the ncx2 implementation, and the differences were minimal, although
> I admit my testing wasn't extensive. Another problem is that
> I haven't got a table of values for this particular distribution,
> so in case results differ I can't tell which one is more accurate
I will look at it, but small numerical inaccuracies, 1e-??, can also be
fixed later, if someone finds a better implementation.,
>
>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>> mentions (but doesn't describe) the case of Skellam distribution with
>> correlated Poisson distributions. Do you know what the difference to
>> your implementation would be?
>
> As far as I know, it makes no difference if the two Poisson
> variates are correlated or not. The Skellam parameters are
> defined as
>
> mu1 = lam1 - rho * sqrt(lam1*lam2)
> mu2 = lam2 - rho * sqrt(lam1*lam2)
>
> where lam1 and lam2 are the Poisson means and rho is the
> correlation coefficient. So, in case there is correlation it
> is implicit in the parameters mu1 and mu2, and it doesn't
> make any difference in terms of calculating the pmf or cdf.
> Again, I am no statician or mathematician.
I agree, that's what I have seen in the references.
>
>> Tests for a new distribution will be picked up by the generic tests,
>> but it would be useful to have some extra tests for extreme/uncommon
>> parameter ranges. Do you have any comparisons with R, since you
>> already looked it?
>
> I have this visual test, although it's usefulness is
> questionable. It shows the deviation between observed and
> expected frequencies for a Skellam random variable. Ideally
> errors should be random and centered around 0. To me
> it looks like errors tend to increase around the mean
> and are smaller along the tails, but I don't know if this
> means something or not.
>
> import numpy
> import scipy.stats.distributions
>
> poisson = scipy.stats.distributions.poisson
> ncx2 = scipy.stats.distributions.ncx2
>
> # Skellam distribution
>
> class skellam_gen(scipy.stats.distributions.rv_discrete):
> def _pmf(self, x, mu1, mu2):
> px = numpy.where(x < 0, ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2,
> ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2)
> return px
> def _cdf(self, x, mu1, mu2):
> x = numpy.floor(x)
> px = numpy.where(x < 0, ncx2.cdf(2*mu2, -2*x, 2*mu1),
> 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2))
> return px
> def _stats(self, mu1, mu2):
> mean = mu1 - mu2
> var = mu1 + mu2
> g1 = (mean) / numpy.sqrt((var)**3)
> g2 = 1 / var
> return mean, var, g1, g2
> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
> shapes="mu1,mu2", extradoc="")
>
> if __name__ == '__main__':
>
> lam1 = 3.4
> lam2 = 6.1
> n = 5000
>
> poisson_var1 = numpy.random.poisson(lam1, n)
> poisson_var2 = numpy.random.poisson(lam2, n)
> skellam_var = poisson_var1-poisson_var2
>
> low = min(skellam_var)
> high = max(skellam_var)
> obs_freq = numpy.histogram(
> skellam_var, numpy.arange(low, high+2))[0] / float(n)
>
> rho = numpy.corrcoef(poisson_var1, poisson_var2)[1,0]
> mu1 = lam1 - rho * numpy.sqrt(lam1*lam2)
> mu2 = lam2 - rho * numpy.sqrt(lam1*lam2)
> exp_freq = skellam.pmf(numpy.arange(low, high+1), mu1, mu2)
> print obs_freq-exp_freq
>
> # plot
> import matplotlib.pyplot as plt
> plt.figure().add_subplot(1,1,1).plot(obs_freq-exp_freq)
> plt.show()
I'm not sure the example is correct. You are simulating two
independent poisson variables, so the difference skellam_var should be
distributed as skellam with
mu1 = lam1
mu2 = lam2
and theoretical rho should be zero. Or am I missing something?
thanks, the numpy.where looks good, I still have to actually run the examples.
Josef
>
>
> Regards.
> --
> Ernest
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list