[Scipysvn] r5289  trunk/scipy/stats
scipysvn@scip...
scipysvn@scip...
Fri Dec 26 16:03:46 CST 2008
Author: josef
Date: 20081226 16:03:43 0600 (Fri, 26 Dec 2008)
New Revision: 5289
Modified:
trunk/scipy/stats/stats.py
Log:
cleanup docstrings of ttests and kstests
Modified: trunk/scipy/stats/stats.py
===================================================================
 trunk/scipy/stats/stats.py 20081224 21:44:53 UTC (rev 5288)
+++ trunk/scipy/stats/stats.py 20081226 22:03:43 UTC (rev 5289)
@@ 1874,11 +1874,29 @@
#####################################
def ttest_1samp(a, popmean, axis=0):
 """
Calculates the tobtained for the independent samples Ttest on ONE group
of scores a, given a population mean.
+ """Calculates the Ttest for the mean of ONE group of scores `a`.
Returns: tvalue, twotailed prob
+ This is a twosided test for the null hypothesis that the expected value
+ (mean) of a sample of independent observations is equal to the given
+ population mean, `popmean`.
+
+ Parameters
+ 
+ a : array_like
+ sample observation
+ popmean : float or array_like
+ expected value in null hypothesis, if array_like than it must have the
+ same shape as `a` excluding the axis dimension
+ axis : int, optional, (default axis=0)
+ Axis can equal None (ravel array first), or an integer (the axis
+ over which to operate on a).
+
+ Returns
+ 
+ t : float or array
+ tstatistic
+ prob : float or array
+ twotailed pvalue
"""
@@ 1904,31 +1922,46 @@
def ttest_ind(a, b, axis=0):
 """Calculates the tobtained Ttest on TWO INDEPENDENT samples of scores
 a, and b. From Numerical Recipies, p.483. Axis can equal None (ravel
 array first), or an integer (the axis over which to operate on a and b).
+ """Calculates the Ttest for the means of TWO INDEPENDENT samples of scores.
 Returns: tvalue, twotailed pvalue

This is a twosided test for the null hypothesis that 2 independent samples
have identical average (expected) values.
 Description
 
+ Parameters
+ 
+ a, b : sequence of ndarrays
+ The arrays must have the same shape, except in the dimension
+ corresponding to `axis` (the first, by default).
+ axis : int, optional
+ Axis can equal None (ravel array first), or an integer (the axis
+ over which to operate on a and b).
+ Returns
+ 
+ t : float or array
+ tstatistic
+ prob : float or array
+ twotailed pvalue
+
+
+ Notes
+ 
+
We can use this test, if we observe two independent samples from
the same or different population, e.g. exam scores of boys and
girls or of two ethnic groups. The test measures whether the
average (expected) value differs significantly across samples. If
 we observe a larger pvalue, for example >0.5 or 0.1 then we
 cannot reject the null hypothesis of identical average scores. If
 the test statistic is larger (in absolute terms than critical
 value or, equivalently, if the pvalue is smaller than the
 threshold, 1%,5% or 10%, then we reject the null hypothesis equal
 averages.
+ we observe a large pvalue, for example larger than 0.05 or 0.1,
+ then we cannot reject the null hypothesis of identical average scores.
+ If the pvalue is smaller than the threshold, e.g. 1%, 5% or 10%,
+ then we reject the null hypothesis of equal averages.
 see: http://en.wikipedia.org/wiki/Ttest#Independent_twosample_ttest
+ References
+ 
+ http://en.wikipedia.org/wiki/Ttest#Independent_twosample_ttest
+
+
Examples

@@ 1941,13 +1974,13 @@
# test with sample with identical means
>>> rvs1 = stats.norm.rvs(loc=5,scale=10,size=500)
>>> rvs2 = stats.norm.rvs(loc=5,scale=10,size=500)
 >>> ttest_ind(rvs1,rvs2)
+ >>> stats.ttest_ind(rvs1,rvs2)
(0.26833823296239279, 0.78849443369564765)
# test with sample with different means
>>> rvs3 = stats.norm.rvs(loc=8,scale=10,size=500)
 >>> ttest_ind(rvs1,rvs3)
+ >>> stats.ttest_ind(rvs1,rvs3)
(5.0434013458585092, 5.4302979468623391e007)
"""
@@ 1979,30 +2012,45 @@
def ttest_rel(a,b,axis=0):
 """Calculates the tobtained Ttest on TWO RELATED samples of scores, a
 and b. From Numerical Recipies, p.483. Axis can equal None (ravel array
 first), or an integer (the axis over which to operate on a and b).
+ """Calculates the Ttest on TWO RELATED samples of scores, a and b.
 Returns: tvalue, twotailed pvalue
+ This is a twosided test for the null hypothesis that 2 related or
+ repeated samples have identical average (expected) values.
 Description
 
+ Parameters
+ 
+ a, b : sequence of ndarrays
+ The arrays must have the same shape.
+ axis : int, optional, (default axis=0)
+ Axis can equal None (ravel array first), or an integer (the axis
+ over which to operate on a and b).
 This is a twosided test for the null hypothesis that 2 repeated samples
 have identical average values.
+ Returns
+ 
+ t : float or array
+ tstatistic
+ prob : float or array
+ twotailed pvalue
 Examples for the use are scores of a student in different exams,
 or repeated sampling from the same units. The test measures
 whether the average score differs significantly across samples
 (e.g. exams). If we observe a larger pvalue, for example >0.5 or
 0.1 then we cannot reject the null hypothesis of identical average
 scores. If the test statistic is larger (in absolute terms than
 critical value or, equivalently, if the pvalue is smaller than
 the threshold, 1%,5% or 10%, then we reject the null hypothesis
 equal averages.
 see: http://en.wikipedia.org/wiki/Ttest#Dependent_ttest
+ Notes
+ 
+ Examples for the use are scores of the same set of student in
+ different exams, or repeated sampling from the same units. The
+ test measures whether the average score differs significantly
+ across samples (e.g. exams). If we observe a large pvalue, for
+ example greater than 0.5 or 0.1 then we cannot reject the null
+ hypothesis of identical average scores. If the pvalue is smaller
+ than the threshold, e.g. 1%, 5% or 10%, then we reject the null
+ hypothesis of equal averages. Small pvalues are associated with
+ large tstatistics.
+
+ References
+ 
+
+ http://en.wikipedia.org/wiki/Ttest#Dependent_ttest
+
Examples

@@ 2056,24 +2104,17 @@
def kstest(rvs, cdf, args=(), N=20, alternative = 'two_sided', mode='approx',**kwds):
"""Return the Dvalue and the pvalue for a KolmogorovSmirnov test
+ This performs a test of the distribution G(x) of an observed
+ random variable against a given distribution F(x). Under the null
+ hypothesis the two distributions are identical, G(x)=F(x). The
+ alternative hypothesis can be either 'two_sided' (default), 'less'
+ or 'greater'. The KS test is only valid for continuous distributions.
 This performs a test of the distribution of random variables G(x) against
 a given distribution F(x). Under the null hypothesis the two distributions
 are identical, G(x)=F(x). The alternative hypothesis can be either
 'two_sided' (default), 'less' or 'greater'. In the two onesided test,
 the alternative is that the empirical cumulative distribution function,
 of the random variable is "less" or "greater" then the cumulative
 distribution function of the hypothesis F(x), G(x)<=F(x), resp. G(x)>=F(x).

 If the pvalue is greater than the significance level (say 5%), then we
 cannot reject the hypothesis that the data come from the given
 distribution.

Parameters

rvs : string or array or callable
string: name of a distribution in scipy.stats
 array: random variables
+ array: 1D observations of random variables
callable: function to generate random variables,
requires keyword argument size
cdf : string or callable
@@ 2082,21 +2123,37 @@
or be the same as rvs
callable: function to evaluate cdf
 args : distribution parameters used if rvs or cdf are strings
 N : sample size if rvs is string or callable
+ args : tuple, sequence
+ distribution parameters, used if rvs or cdf are strings
+ N : int
+ sample size if rvs is string or callable
alternative : 'two_sided' (default), 'less' or 'greater'
defines the alternative hypothesis (see explanation)
mode : 'approx' (default) or 'asymp'
 defines distribution used for calculating pvalue
+ defines the distribution used for calculating pvalue
'approx' : use approximation to exact distribution of test statistic
'asymp' : use asymptotic distribution of test statistic
Returns

 D: test statistic either D, D+ or D
 pvalue
+ D : float
+ KS test statistic, either D, D+ or D
+ pvalue : float
+ onetailed or twotailed pvalue
+ Notes
+ 
+
+ In the two onesided test, the alternative is that the empirical
+ cumulative distribution function of the random variable is "less"
+ or "greater" then the cumulative distribution function F(x) of the
+ hypothesis, G(x)<=F(x), resp. G(x)>=F(x).
+
+ If the pvalue is greater than the significance level (say 5%), then we
+ cannot reject the hypothesis that the data come from the given
+ distribution.
+
Examples

@@ 2214,25 +2271,37 @@
def ks_2samp(data1, data2):
""" Computes the KolmogorovSmirnof statistic on 2 samples.
 data1, data2: array_like, 1dim
 samples assumed to be drawn from a continuous distribution,
 sample sizes can be different
+ This is a twosided test for the null hypothesis that 2 independent samples
+ are drawn from the same continuous distribution.
 Returns: KS Dvalue, pvalue
+ Parameters
+ 
+ a, b : sequence of 1D ndarrays
+ two arrays of sample observations assumed to be drawn from a continuous
+ distribution, sample sizes can be different
 Description:
 
 Tests whether 2 samples are drawn from the same distribution. Note
 that, like the onesample KS test the distribution is assumed to be
 continuous.
+ Returns
+ 
+ D : float
+ KS statistic
+ pvalue : float
+ twotailed pvalue
+
+ Notes
+ 
+
+ This tests whether 2 samples are drawn from the same distribution. Note
+ that, like in the case of the onesample KS test, the distribution is
+ assumed to be continuous.
+
This is the twosided test, onesided tests are not implemented.
The test uses the twosided asymptotic KolmogorovSmirnov distribution.
If the KS statistic is small or the pvalue is high, then we cannot
 reject the hypothesis that the two distributions of the two samples
 are the same
+ reject the hypothesis that the distributions of the two samples
+ are the same.
Examples:

More information about the Scipysvn
mailing list