[SciPy-user] Usage of scipy KS test

Alexander Dietz Alexander.Dietz@astro.cf.ac...
Wed Jan 2 14:24:22 CST 2008


Hi,

On Jan 2, 2008 8:15 PM, Anne Archibald <peridot.faceted@gmail.com> wrote:

> On 02/01/2008, Alexander Dietz <Alexander.Dietz@astro.cf.ac.uk> wrote:
>
> > On Jan 2, 2008 5:08 PM, Anne Archibald <peridot.faceted@gmail.com > wro
>
> > > scipy.stats.kstest(x,dict(zip(x,m)).get)
>
> > When I use your suggestion, I get an error:
> >
> >  File
> > "/usr/lib/python2.4/site-packages/scipy/stats/stats.py",
> > line 1716, in kstest
> >     cdfvals = cdf(vals, *args)
> > TypeError: unhashable type
> >
> > I tried with get(), but this also did not work.  Also, in this example I
> do
> > not see the vector 'm' containing the modeled values. They must enter
> > somehow the expression....
>
> Well, if x is the list of x values (floats) and m is the list of CDF
> values (also floats), then zip(x,m) is the list of pairs (x, CDF(x)).
> If you have arrays, you might need to convert them to lists first
> (x=list(x) for example). dict(zip(x,m)) makes a dictionary out of such
> a list of pairs. dict(zip(x,m)).get is a function that maps xs to ms.
> Unfortunately it only maps a single x to a single m; you need to use
> numpy.vectorize on it:
>
> scipy.stats.kstest(x,numpy.vectorize(dict(zip(x,m)).get))
>
> numpy.vectorize makes it able to map an array of xs to an array of ms.
> That should work. But if you can, you should give kstest your real
> CDF-calculating function (possibly wrapped in numpy.vectorize, if it
> doesn't work on arrays).


Sorry, mixed up two vectors. In the expression above you us the vectors x
and m, but not y. See the following concrete example, which defines three
vectors and a plot:


x = numpy.asarray([ 0.089,  0.11,   0.161,  0.226,  0.257,  0.287,  0.31,
0.41,   0.438,  0.45,\
           0.547,  0.827,  1.13,   1.8  ])
y = numpy.asarray([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
m = numpy.asarray([  0.91405923 ,  1.36472838,   2.94870517,   4.59609492,
5.37847868,\
            6.11545809 ,  6.57990978,   8.56403531,   9.0550575,
9.20841591,\
            10.50502489,  12.50640372, 13.29624546,  13.64958435])

clf()
plot( x, y)
plot( x, m)
savefig('test.png')

My question: With what probability to the two lines match, i.e. what is the
probability that both curves are (not) from the same distribution.

Also; your example above still dod not work. Here is the error:

  File "/usr/lib/python2.4/site-packages/numpy/lib/function_base.py", line
799, in __init__
    nin, ndefault = _get_nargs(pyfunc)
  File "/usr/lib/python2.4/site-packages/numpy/lib/function_base.py", line
756, in _get_nargs
    raise ValueError, 'failed to determine the number of arguments for %s' %
(obj)
ValueError: failed to determine the number of arguments for <built-in method
get of dict object at 0xb78703e4>


Thanks
  Alex




> > Assumed, I calculate the D-value by myself. Can I then use stats.ksprobto
> > calculate the probability? Do I have to use sqrt(n)*D as argument?
>
> I'm not sure what ksprob wants. It will really be clearer to use kstest.
>
> I should warn you, if your probability distribution is not continuous
> - like, for example, a Poisson distribution - kstest will not work.
>
> Anne
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20080102/c9e8243a/attachment.html 


More information about the SciPy-user mailing list