[SciPy-dev] Updating and improving the statistical capabilities in Scipy

Bruce Southey bsouthey@gmail....
Thu Feb 26 16:02:44 CST 2009

I apologize in advance if this is the wrong approach.

All this talk has inspired me to do something about developing the 
statistics of Scipy.

We need to develop a strategy to improve the statistical functions 
within Scipy. A central requirement is a sort of code review to ensure 
that the existing functions have adequately documentation (see also the 
Scipy documentation Marathon 
http://www.scipy.org/Developer_Zone/DocMarathon2008) and have 
appropriate tests for functionality and accuracy.

Basically I strongly believe that we must carefully use a slow 
divide-and-conqueror approach to succeed simply due to the scope 
involved.  I would be extremely interested in what people would like to 
see so that we can develop specific goals and action plans to update and 
improve them.

I consider at the least the following major areas currently present in 
Scipy that I am aware of:
1) Statistical distributions: Josef has vastly improved this.
2) Uni- and multi-variate kernel density estimation - currently Gaussian 
only available.
3) Basic statistical functions - available for standard and masked 
arrays but it is inconsistent.
4) Model fitting aspects that integrates different code within Scipy 
(including Jonathan Taylor's model class - which is really impressive 
and the Cookbook ols) to provide important functionality including 
general linear models, generalized linear models, and generalized 
additive models.

I would suggest that we develop some type of PEP structure as starting 
point for discussion as well as using different threads to address 
different areas as well as future directions. Therefore I have put 
together something to address the basic statistical functions in a 
separate thread.


More information about the Scipy-dev mailing list