[SciPy-user] Usage of scipy KS test
Alexander Dietz
Alexander.Dietz@astro.cf.ac...
Wed Jan 2 14:24:22 CST 2008
Hi,
On Jan 2, 2008 8:15 PM, Anne Archibald <peridot.faceted@gmail.com> wrote:
> On 02/01/2008, Alexander Dietz <Alexander.Dietz@astro.cf.ac.uk> wrote:
>
> > On Jan 2, 2008 5:08 PM, Anne Archibald <peridot.faceted@gmail.com > wro
>
> > > scipy.stats.kstest(x,dict(zip(x,m)).get)
>
> > When I use your suggestion, I get an error:
> >
> > File
> > "/usr/lib/python2.4/site-packages/scipy/stats/stats.py",
> > line 1716, in kstest
> > cdfvals = cdf(vals, *args)
> > TypeError: unhashable type
> >
> > I tried with get(), but this also did not work. Also, in this example I
> do
> > not see the vector 'm' containing the modeled values. They must enter
> > somehow the expression....
>
> Well, if x is the list of x values (floats) and m is the list of CDF
> values (also floats), then zip(x,m) is the list of pairs (x, CDF(x)).
> If you have arrays, you might need to convert them to lists first
> (x=list(x) for example). dict(zip(x,m)) makes a dictionary out of such
> a list of pairs. dict(zip(x,m)).get is a function that maps xs to ms.
> Unfortunately it only maps a single x to a single m; you need to use
> numpy.vectorize on it:
>
> scipy.stats.kstest(x,numpy.vectorize(dict(zip(x,m)).get))
>
> numpy.vectorize makes it able to map an array of xs to an array of ms.
> That should work. But if you can, you should give kstest your real
> CDF-calculating function (possibly wrapped in numpy.vectorize, if it
> doesn't work on arrays).
Sorry, mixed up two vectors. In the expression above you us the vectors x
and m, but not y. See the following concrete example, which defines three
vectors and a plot:
x = numpy.asarray([ 0.089, 0.11, 0.161, 0.226, 0.257, 0.287, 0.31,
0.41, 0.438, 0.45,\
0.547, 0.827, 1.13, 1.8 ])
y = numpy.asarray([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
m = numpy.asarray([ 0.91405923 , 1.36472838, 2.94870517, 4.59609492,
5.37847868,\
6.11545809 , 6.57990978, 8.56403531, 9.0550575,
9.20841591,\
10.50502489, 12.50640372, 13.29624546, 13.64958435])
clf()
plot( x, y)
plot( x, m)
savefig('test.png')
My question: With what probability to the two lines match, i.e. what is the
probability that both curves are (not) from the same distribution.
Also; your example above still dod not work. Here is the error:
File "/usr/lib/python2.4/site-packages/numpy/lib/function_base.py", line
799, in __init__
nin, ndefault = _get_nargs(pyfunc)
File "/usr/lib/python2.4/site-packages/numpy/lib/function_base.py", line
756, in _get_nargs
raise ValueError, 'failed to determine the number of arguments for %s' %
(obj)
ValueError: failed to determine the number of arguments for <built-in method
get of dict object at 0xb78703e4>
Thanks
Alex
> > Assumed, I calculate the D-value by myself. Can I then use stats.ksprobto
> > calculate the probability? Do I have to use sqrt(n)*D as argument?
>
> I'm not sure what ksprob wants. It will really be clearer to use kstest.
>
> I should warn you, if your probability distribution is not continuous
> - like, for example, a Poisson distribution - kstest will not work.
>
> Anne
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-user/attachments/20080102/c9e8243a/attachment.html
More information about the SciPy-user
mailing list