[Scipy-tickets] [SciPy] #604: Statistics functions with new options

SciPy scipy-tickets@scipy....
Fri Feb 15 16:12:13 CST 2008


#604: Statistics functions with new options
-------------------------+--------------------------------------------------
 Reporter:  rspringuel   |       Owner:  somebody
     Type:  enhancement  |      Status:  new     
 Priority:  normal       |   Milestone:  0.7     
Component:  scipy.stats  |     Version:          
 Severity:  normal       |    Keywords:          
-------------------------+--------------------------------------------------
 I created a personal statistics package because the functions already in
 scipy didn't satisfy the needs that I had.  In some cases this was because
 scipy's version didn't do what I wanted it to do, in other cases it was
 because scipy didn't have that function (or at least I couldn't find it if
 it did).  In those cases where scipy already had a version of the
 function, I tried to make my version capable of doing all the things that
 the scipy version could do, but since I code exclusively in python (I
 don't know any other coding languages) there were somethings that I
 couldn't implement (I couldn't modify the ndarray class, for instance, to
 make these methods of that class when numpy had that feature).

 The features that I added because I needed them are as follows:
 weights : of the statistical functions available, only average had the
 ability to incorporate weights.  All of my descriptive statistics
 functions have the ability to incorporate weights.  Where weights are not
 traditionally thought of, I started from the premise that in cases where
 all weights are whole numbers they should correspond to the frequency of
 observation for that particular data point.  E.g. unweighted statistics of
 the set [1,2,3,4,4,5,5,5] should have the same result as the set
 [1,2,3,4,5] with weights [1,1,1,2,3].  I've also included the ability to
 use ErrorVal, a package I found online, as the source for the weights, but
 mostly included that because I liked the idea behind ErrorVal when I found
 it.  Since writing that portion of the code I've found that I don't
 actually use ErrorVal all that much, so I wouldn't be adverse to editing
 that part of the code out if ErrorVal is not something that can be
 referenced for license reasons.

 NaN handling: I needed the ability to specify how nan variables were
 handled in an array seperately from inf variables.  This is currently
 handled by having seperate functions for the nan friendly versions, some
 of which are buried in subpackages of stats.  I thought that this made
 them more difficult to use and so created my functions with a boolean flag
 that can set how nan variables are treated.


 In addition to the descriptive statistics, I also filled out my package
 with some Goodness of Fit, Parameter, and Analysis of Residuals
 statistics.  These help with the post hoc analysis on an optimization
 solution for data.

-- 
Ticket URL: <http://scipy.org/scipy/scipy/ticket/604>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list