[SciPy-user] kstest and scipy.stats

Robin robince@gmail....
Thu Nov 20 10:56:45 CST 2008


I am having trouble using kstest and the scipy.stats package which I
suspect is due to a misunderstanding.

Basically I'm confused by the below:
O is an array of observed (integer) values:
In [344]: O.shape
Out[344]: (1400,)
In [345]: O.max()
Out[345]: 21
In [346]: O.min()
Out[346]: 0

Now I am trying to use the kstest to determine how closely they
described this vector of data. But I was getting low values with
kstest (always p of zero - even when plotting the distributions shows
that by eye they are a very good fit).

But the thing that really confuses me is this:
In [337]: kstest(O,
Out[337]: (0.31071428571428572, 0.0)

Prob is a small function of mine that returns a probability vector
from a vector of integers (shown below - I have been using it for ages
and I'm sure there is no mistake there). rv_discrete seems to
construct the right distribution (mean and so on match) - so how come
the p value is 0, when I am comparing to the distribution directly
sampled from the data?

Any help greatfully appreciated,


def prob(x, r):
    """Sample probabity of integer sequence using bincount

    x - integer sequence
    r - number of possible responses (max(x)<r)

    Returns full probability vector (float)

    if (not np.issubdtype(x.dtype, np.int)):
        raise ValueError, "Input must be of integer type"
    P = np.bincount(x).astype(np.float)
    n = P.size
    if n < r:   # resize if any responses missed
    P /= x.size
    return P

More information about the SciPy-user mailing list