[SciPy-User] adding distributions from hydroclimpy to stats.distributions

Pierre GM pgmdevlist@gmail....
Sun Aug 2 14:52:56 CDT 2009


On Aug 2, 2009, at 9:41 AM, josef.pktd@gmail.com wrote:
>
> I looked briefly at the distributions in hydroclimpy
> http://projects.scipy.org/scikits/browser/trunk/hydroclimpy/scikits/hydroclimpy/stats/extradistributions.py
>
> my first impression:
>
> kappa, glogistic, gennorm and wakeby
> can be added almost without changes to stats distributions, since they
> are already in the standard format
> cosmetic changes: add longname and extradocs (from module docstring)

Agreed. I still have an issue about defining a proper template for  
describing the distributions (eqns for pdf/cdf/ppf, example of usage,  
plots...), hence the nudge. What are our doc exegetes' recommendations ?


> pearson3
> This one overwrites the main public methods, pdf, cdf, ...,
> Can this be rewritten to define only the private, distribution
> specific methods, _cdf, _pdf, or is there a special reason for the
> public methods?

Depending on the values of the parameters, Pearson III can reduce to a  
normal. Overwriting .pdf and .cdf was IMHO more efficient than trying  
to stick to the _pdf/_cdf methods. The same problem arises when a  
distribution reduces to another in some particular cases.



> ztnbinom and logseries look like duplicates of stats.nbinom and  
> stats.logser
> ztnbinom uses a different way to calculate stats

ztnbinom is the zero-truncated negative binomial distribution, a  
particular case of the negative binomial where support is restricted  
to integers larger or equal than 1 (no zero class). Yes, the stats are  
slightly different because of the truncation. Similarly, we can define  
a zero-inflated Poisson.

I considered developing a generic trunc_dist class from rv_discrete to  
handle arbitrary truncation, but realize that the scope was too large  
for me to handle, and I've already far enough on my plate(s) for now.

> logseries adds a fit function
> Is there a difference that I'm missing after my only brief look?

I had overlooked the logser distribution (silly me). Adding the fit  
method is required for my own applications (analyzing dry/wet spells  
distributions). I'm about to add fit methods to other discrete  
distribution as I need them.



> I don't know anything about L moments and only briefly looked up the
> definitions. Is there a generic method, that works (reasonably well)
> for all distribution?

L-moments are defined for continuous distributions only. You can find  
a nice description of their definition and use here:
http://www.research.ibm.com/people/h/hosking/lmoments.html
In short, they tend to be more robust that the classical moments. The  
facts that the L-kurtosis and L-skewness are in the interval [-1;+1]  
simplifies the comparisons between different distributions when trying  
to define the most adequate one.
L-moments of some specific distributions have an explicit formulation  
that can help estimating the parameters of these distributions (hence  
the whole lmoments.py module).


> I assume the main work would be to make sure that adding a new method
> would work with all distributions. I would gladly review a patch, but
> I don't have the time to do the integration into stats.distributions
> and the testing myself.

OK, what about we keep them on the backburner for now ? Hopefully I'll  
have more time to deal with polishing the docs and adding more tests  
soon. My advertising these new distributions was primarily to let  
other users know that they're already implemented somewhere, to  
illustrate the need for a doc template



More information about the SciPy-User mailing list