There was a discussion in ticket 901 on some of the statistical tests
in scipy.stats, and I thought, I post some notes that I keep to have
an overview on the status of it. This doesn't cover all of stats (e.g.
not descriptive statistics)
Josef
Inferential Statistics
======================
tests for location:
-------------------
t-tests and similar
ttest_1samp
ttest_ind
ttest_rel
f_oneway (F-test)
glm
Notes
-----
the t-tests, ttest_1samp, ttest_ind and ttest_rel, have been rewritten
and are well tested
glm has very incomplete description, just a t-test, needs rewrite
f_oneway: verified with Nist test set for balanced anova, correct
but looses numerical precision at
medium to high difficulty, I have a rewrite with higher
numerical precision
rank based tests (some are equivalent)
mannwhitneyu
ranksums
wilcoxon
kruskal
friedmanchisquare
Notes
-----
For 2 random variables and no ties mannwhitneyu, ranksums and kruskal
are equivalent (i.e. return the same p-values but based on different
statistics)
kruskal has correct tie handling and works for more than two
random variables
friedmanchisquare has been verified, (corrected tie handling)
mannwhitneyu: corrected, verified
ranksums: no tie handling
look at Monte Carlo p-values again, initial trying out didn't show
improvement
tests for scale:
----------------
ansari
bartlett
levene
fligner
mood
Notes
-----
I didn't verify any of them by comparing to R or matlab
Brief checking with Monte Carlo shows that they work (reject wrong
Null, accept correct Null)
tests for distribution:
-----------------------
general
chisquare
kstest
ks_2samp
anderson
Notes
-----
kstest, ks_2samp were rewritten and verified
anderson may be fishy, but didn't look very carefully
chisquare: I use a copy of it in test of discrete distributions
and seems to work well
for normal distribution
skewtest
kurtosistest
normaltest
shapiro
Notes
-----
not verified but look ok in brief Monte Carlo tests and use in examples
other
binom_test
Notes
-----
no idea
Anova Ftests
------------
f_oneway
(for the following: no statistics from data calculated, no p-values returned)
f_value_wilks_lambda
f_value
f_value_multivariate
Notes
-----
f_oneway see above
others no idea
Correlation measures including pvalues
--------------------------------------
pearsonr
spearmanr
pointbiserialr
kendalltau
Notes
-----
pearsonr is just standard corrcoef, can be rewritten (mostly
delegated to numpy.corrcoef)
spearmanr needs rewriting, no tiehandling yet, can be reduced to
corrcoef on rankdata
pointbiserialr can be reduced to np.corrcoef, dropped?
kendalltau is verified, p-value (variance) does not correct for ties
extension in cython attached to ticket (but no p-values)
Distributions - diagnostics and graphical analysis
==================================================
box-cox transformation only checked whether they run
plots look ok, converted to matplotlib
pdfapprox is broken, I have enhanced rewrite, no good tests yet
