[SciPy-dev] Updating and improving the statistical capabilities in Scipy
Thu Feb 26 16:02:44 CST 2009
I apologize in advance if this is the wrong approach.
All this talk has inspired me to do something about developing the
statistics of Scipy.
We need to develop a strategy to improve the statistical functions
within Scipy. A central requirement is a sort of code review to ensure
that the existing functions have adequately documentation (see also the
Scipy documentation Marathon
http://www.scipy.org/Developer_Zone/DocMarathon2008) and have
appropriate tests for functionality and accuracy.
Basically I strongly believe that we must carefully use a slow
divide-and-conqueror approach to succeed simply due to the scope
involved. I would be extremely interested in what people would like to
see so that we can develop specific goals and action plans to update and
I consider at the least the following major areas currently present in
Scipy that I am aware of:
1) Statistical distributions: Josef has vastly improved this.
2) Uni- and multi-variate kernel density estimation - currently Gaussian
3) Basic statistical functions - available for standard and masked
arrays but it is inconsistent.
4) Model fitting aspects that integrates different code within Scipy
(including Jonathan Taylor's model class - which is really impressive
and the Cookbook ols) to provide important functionality including
general linear models, generalized linear models, and generalized
I would suggest that we develop some type of PEP structure as starting
point for discussion as well as using different threads to address
different areas as well as future directions. Therefore I have put
together something to address the basic statistical functions in a
More information about the Scipy-dev