[Scipy-tickets] [SciPy] #604: Statistics functions with new options
SciPy
scipy-tickets@scipy....
Fri Feb 15 16:12:13 CST 2008
#604: Statistics functions with new options
-------------------------+--------------------------------------------------
Reporter: rspringuel | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone: 0.7
Component: scipy.stats | Version:
Severity: normal | Keywords:
-------------------------+--------------------------------------------------
I created a personal statistics package because the functions already in
scipy didn't satisfy the needs that I had. In some cases this was because
scipy's version didn't do what I wanted it to do, in other cases it was
because scipy didn't have that function (or at least I couldn't find it if
it did). In those cases where scipy already had a version of the
function, I tried to make my version capable of doing all the things that
the scipy version could do, but since I code exclusively in python (I
don't know any other coding languages) there were somethings that I
couldn't implement (I couldn't modify the ndarray class, for instance, to
make these methods of that class when numpy had that feature).
The features that I added because I needed them are as follows:
weights : of the statistical functions available, only average had the
ability to incorporate weights. All of my descriptive statistics
functions have the ability to incorporate weights. Where weights are not
traditionally thought of, I started from the premise that in cases where
all weights are whole numbers they should correspond to the frequency of
observation for that particular data point. E.g. unweighted statistics of
the set [1,2,3,4,4,5,5,5] should have the same result as the set
[1,2,3,4,5] with weights [1,1,1,2,3]. I've also included the ability to
use ErrorVal, a package I found online, as the source for the weights, but
mostly included that because I liked the idea behind ErrorVal when I found
it. Since writing that portion of the code I've found that I don't
actually use ErrorVal all that much, so I wouldn't be adverse to editing
that part of the code out if ErrorVal is not something that can be
referenced for license reasons.
NaN handling: I needed the ability to specify how nan variables were
handled in an array seperately from inf variables. This is currently
handled by having seperate functions for the nan friendly versions, some
of which are buried in subpackages of stats. I thought that this made
them more difficult to use and so created my functions with a boolean flag
that can set how nan variables are treated.
In addition to the descriptive statistics, I also filled out my package
with some Goodness of Fit, Parameter, and Analysis of Residuals
statistics. These help with the post hoc analysis on an optimization
solution for data.
--
Ticket URL: <http://scipy.org/scipy/scipy/ticket/604>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list