[SciPy-User] distributions - who got the most ?
Sat Dec 8 04:36:44 CST 2012
On Wed, Dec 5, 2012 at 2:15 AM, <email@example.com> wrote:
> On Tue, Dec 4, 2012 at 4:01 PM, Ralf Gommers <firstname.lastname@example.org>
> > On Tue, Dec 4, 2012 at 4:30 AM, <email@example.com> wrote:
> >> scipy.stats has more than 90 distributions.
> >> Do we want to increase it by almost a factor of 10? :)
> >> While looking for the cdf of a distribution, I found this :
> >> He collected 870 distributions (under BSD license). Includes generic
> >> random number generation.
> >> Even though there are some variations of distributions counted
> >> separately, given my quick browsing this looks impressive and a good
> >> source for code and references.
> >> Coding style is not great but it's 10 years or so of collecting
> >> distributions.
> > Adding a lot of distributions sounds fine to me. That many distributions
> > would need to go into a separate namespace.
> > Any additions should be complete though (the Matlab code only has
> > and well tested. The Matlab code doesn't look all that useful except for
> > references ("coding style is not great" is really too kind). I also don't
> > trust the BSD license that's put on it, many files have different author
> > names in them with no mention of where they came from.
> The matlab code includes several "special" functions that look mostly
> copied from other authors.
> This would need checking, but I doubt we need many of those since we
> have scipy.special.
> We are missing some special functions for distributions, but I didn't
> check whether he has any of those.
> The pdfs, and the cdfs when available, look like they were implemented
> by the author, at least it looks that way for the small sample that I
> (code quality varies a lot, but many distributions are vectorized or
> can be easily vectorized from his code.
> Given the pdf, the rest could all be derived generically. But it won't
> be efficient.
True, but that doesn't feel quite right. Tickets are being opened regularly
about precision issues due to using the generic methods. Same for speed,
but that's perhaps less critical. Generic methods are often not good enough.
> Also, I just saw that sympy could become useful to derive extra properties
> sympy.stats also works based only on the pdf (from what I have seen).
> I'm a bit skeptical about the number of distributions that are
> actually generally useful and not just used once in a journal article.
> My impression from tracking several statistics journals is that there
> are at least 10 new distributions each year.
> As an example, he has a long list of poisson mixture distributions
> that I never heard of except for negative binomial. They might be
> useful in some cases, but a more general class might cover it better.
> >From a brief look at his reference
> I think it might not be necessary to implement all details for 5 or
> more distributions separately.
> According to Google the paper has only 4 citations. see also 1)
> But there are a lot of distributions, or classes/categories of
> distributions that scipy is missing, and are for example available in
> R, but in R they are spread out over many packages.
Keeping a list of those in a ticket could be useful.
> 1) another reference for poisson mixtures (technical, not a quick
> read, but a funny table)
> Karlis, D. and Xekalaki, E. (2005), Mixed Poisson Distributions.
> International Statistical Review, 73: 35–58. doi:
> Table 1
> Some mixed Poisson distributions.
> Mixed Poisson Distribution Mixing Distribution A Key Reference
> Negative Binomial Gamma Greenwood & Yule (1920)
> Geometric Exponential Johnson et al. (1992)
> Poisson-Linear Exponential Family Linear Exponential Family Sankaran (1969)
> Poisson–Lindley Lindley Sankaran (1970)
> Poisson-Linear Exponential Linear Exponential Kling & Goovaerts (1993)
> Poisson-Lognormal Lognormal Bulmer (1974)
> Poisson-Confluent Hypergeometric Series Confluent Hypergeometric
> Series Bhattacharya (1966)
> Poisson-Generalized Inverse Gaussian Generalized Inverse Gaussian Sichel
> Sichel Inverse Gaussian Sichel (1975)
> Poisson-Inverse Gamma Inverse Gamma Willmot (1993)
> Poisson-Truncated Normal Truncated Normal Patil (1964)
> Generalized Waring Gamma Product Ratio Irwin (1975)
> Simple Waring Exponential Beta Pielou (1962)
> Yule Beta with Specific Parameter Values Simon (1955)
> Poisson-Generalized Pareto Generalized Pareto Kempton (1975)
> Poisson-Beta I Beta Type I Holla & Bhattacharya (1965)
> Poisson-Beta II Beta Type II Gurland (1958)
> Poisson-Truncated Beta II Truncated Beta Type II Willmot (1986)
> Poisson-Uniform Uniform Bhattacharya (1966)
> Poisson-Truncated Gamma Truncated Gamma Willmot (1993)
> Poisson-Generalized Gamma Generalized Gamma Albrecht (1984)
> Dellaporte Shifted Gamma Ruohonen (1988)
> Poisson-Modified Bessel of the 3rd Kind Modified Bessel of the 3rd
> Kind Ong & Muthaloo (1995)
> Poisson–Pareto Pareto Willmot (1993)
> Poisson-Shifted Pareto Shifted Pareto Willmot (1993)
> Poisson–Pearson Family Pearson’s Family of Distributions Albrecht (1982)
> Poisson-Log-Student Log-Student Gaver & O’Muircheartaigh (1987)
> Poisson-Power Function Power Function Distribution Rai (1971)
> Poisson–Lomax Lomax Al-Awadhi & Ghitany (2001)
> Poisson-Power Variance Power Variance Family Hougaard et al. (1997)
> Neyman Poisson Douglas (1980)
> Other Discrete Distributions Johnson et al. (1992)
> > Ralf
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User@scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> SciPy-User mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User