[SciPy-User] the skellam distribution
Bruce Southey
bsouthey@gmail....
Mon Nov 9 10:19:41 CST 2009
On 11/09/2009 10:01 AM, Johann Cohen-Tanugi wrote:
> From what I understand of the initial statement from Ernest:
> "
> In case somebody is interested, or you want to include it
> in scipy. I used these specs here from the R package:
> cran.r-project.org/web/packages/skellam/skellam.pdf
> "
>
> he used the spec, as defined in this pdf, and did not look at the code
> itself. If my interpretation of the small preamble above is correct, I
> believe his implementation is not GPL-tainted, right?
>
> Johann
> Bruce Southey wrote:
>
>> On 11/09/2009 08:40 AM, josef.pktd@gmail.com wrote:
>>
>>
>>> 2009/11/8 Ernest Adrogué<eadrogue@gmx.net>:
>>>
>>>
>>>
>>>> Hi,
>>>>
>>>> In case somebody is interested, or you want to include it
>>>> in scipy. I used these specs here from the R package:
>>>> cran.r-project.org/web/packages/skellam/skellam.pdf
>>>>
>>>> Note that I am no statician, somebody who knows what he's
>>>> doing (as opposed to me ;) should verify it's correct.
>>>>
>>>>
>>>> import numpy
>>>> import scipy.stats.distributions
>>>>
>>>> # Skellam distribution
>>>>
>>>> ncx2 = scipy.stats.distributions.ncx2
>>>>
>>>> class skellam_gen(scipy.stats.distributions.rv_discrete):
>>>> def _pmf(self, x, mu1, mu2):
>>>> if x< 0:
>>>> px = ncx2.pdf(2*mu2, 2*(1-x), 2*mu1)*2
>>>> else:
>>>> px = ncx2.pdf(2*mu1, 2*(x+1), 2*mu2)*2
>>>> return px
>>>> def _cdf(self, x, mu1, mu2):
>>>> x = numpy.floor(x)
>>>> if x< 0:
>>>> px = ncx2.cdf(2*mu2, x*(-2), 2*mu1)
>>>> else:
>>>> px = 1-ncx2.cdf(2*mu1, 2*(x+1), 2*mu2)
>>>> return px
>>>> def _stats(self, mu1, mu2):
>>>> mean = mu1 - mu2
>>>> var = mu1 + mu2
>>>> g1 = (mu1 - mu2) / numpy.sqrt((mu1 + mu2)**3)
>>>> g2 = 1 / (mu1 + mu2)
>>>> return mean, var, g1, g2
>>>> skellam = skellam_gen(a=-numpy.inf, name="skellam", longname='A Skellam',
>>>> shapes="mu1,mu2", extradoc="")
>>>>
>>>>
>>>>
>>>>
>>> Thanks, I think the distribution of the difference of two poisson
>>> distributed random variables could be useful.
>>>
>>> Would you please open an enhancement ticket for this at
>>> http://projects.scipy.org/scipy/report/1
>>>
>>> I had only a brief look at it so far, I had never looked at the
>>> Skellam distribution before, and just read a few references.
>>>
>>> The "if x< 0 .. else ..." will have to be replace with a
>>> "numpy.where" assignment, since the methods are supposed to work with
>>> arrays of x (as far as I remember)
>>>
>>> _rvs could be implemented directly instead of generically (I don't
>>> find the reference, where I saw it, right now).
>>>
>>> Documentation will be necessary, a brief description in the
>>> (currently) extradocs, and a listing of the properties for the
>>> description of the distributions currently in the stats tutorial.
>>>
>>> I have some background questions, which address the limitation of the
>>> implementation (but are not really necessary for inclusion into
>>> scipy).
>>>
>>> The description in R mentions several implementation of Skellam. Do
>>> you have a rough idea what the range of parameters are for which the
>>> implementation using ncx produces good results? Do you know if any
>>> other special functions would produce good results over a larger
>>> range, e.g. using Bessel function?
>>>
>>> Wikipedia, http://en.wikipedia.org/wiki/Skellam_distribution , also
>>> mentions (but doesn't describe) the case of Skellam distribution with
>>> correlated Poisson distributions. Do you know what the difference to
>>> your implementation would be?
>>>
>>> Tests for a new distribution will be picked up by the generic tests,
>>> but it would be useful to have some extra tests for extreme/uncommon
>>> parameter ranges. Do you have any comparisons with R, since you
>>> already looked it?
>>>
>>>
>>> Thanks again, I'm always looking out for new useful distributions,
>>> (but I have to find the time to do the testing and actual
>>> implementation).
>>>
>>> Josef
>>>
>>>
>>>
>>>
>>>
>>>> Bye.
>>>>
>>>> --
>>>> Ernest
>>>> _______________________________________________
>>>> SciPy-User mailing list
>>>> SciPy-User@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>>
>> Generally any R code can not be used in numpy because R is GPL. Usually
>> R code is also licensed under GPL so translation from R to Python/numpy
>> still maintains the original license. So the code not used by numpy
>> unless that code is licensed under a BSD compatible license.
>>
>> You *must* show that you implementation is from a BSD-compatible source
>> not from the R package. I can see that your code is very simple so there
>> should be an viable alternative source.
>>
>> Also, in the _stats function why do you do not re-use the mean and var
>> variables in computing the g1 and g2 variables?
>>
>> What are 'x, mu1, mu2' ?
>> This looks like a scalar implementation so you need to either check that
>> or allow for array-like inputs.
>>
>> Bruce
>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
I am not a lawyer! But I do not see that any reference to not seeing the
code. Furthermore, there is insufficient information in the cited
reference for this implementation (but I have not seen the actual code
and would rather not have to see it). But, as Josef pointed out, there
is a Wikipedia source so it should be trivial to show that this code is
independent of the R implementation.
Bruce
More information about the SciPy-User
mailing list