[SciPy-dev] State of stats modules?

Gary Strangman strang at nmr.mgh.harvard.edu
Mon Nov 5 12:15:09 CST 2001

> Most of the stats module was written by Gary Strangman
> (http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/).  As far as I
> know, it's the most full featured stats module around, and has very good
> doc-strings.  Gary developed it for his own research, and, as such, it is
> somewhat specialized to his field.  Still, it is very usable, and, at least
> for the functions I have used, reliable.

Specialized it is, and I have variable confidence in the various functions
(some are much more used--read: better tested--than others).

> Gary's work was/is an excellent starting point for SciPy's statistics
> capabilities.  Most of the work needed is actually trimming out extra
> functionality not needed or duplicated, adding unit test functions, and
> assuring that functions behave similarly in calling convention to other
> Numeric/SciPy functions.  The new_stats.py module is the beginnings of this
> effort, but it hasn't had any attention in a while.  There are also the
> beginnings of some unit testing in the stats/tests directory.  Hopefully a
> full compliment of unit tests will develop so there are fewer questions
> about result vailidity.

This would be outstanding ... particularly the unit testing. I've done
some, but way too little.

> aanova and collapse:
> I haven't used these, and don't know much about them.  I'll forward this to
> Gary and see if he has any comments.

aanova() was a simple analysis of variance function, commonly
used in behavioral-type research but broadly applicable. It was written
when I was learning about anovas in grad school, and hence is poorly
written, poorly tested, and non-optimized. (It worked for the stuff I
needed, when I needed it, but my I have pulled the function from more
recent versions of my module out of my own concerns about its adequacy and
hence utility.)

collapse() is a generic function to collapse over rows of a data file. It
finds unique combinations of values in the columns specified by keepcols
and for each such unique combination it calculates a collapse-function
(mean, sterr, user-defined) for each column specified in collapsecols. 

> The stats module deserves some attention, but isn't receiving any right now.
> Any takers?

More recent versions of pstat.py and stats.py (at least more recent than
the def's that were quoted) can be found on my web site


but sadly those are modified only very slowly and irregularly at best.
"Takers" are welcome. :-)


