[SciPy-Dev] cephes_smirnov never returns on mips/sparc/...

josef.pktd@gmai... josef.pktd@gmai...
Sun Apr 1 10:36:09 CDT 2012


On Sat, Mar 31, 2012 at 12:02 PM,  <josef.pktd@gmail.com> wrote:
> On Sat, Mar 31, 2012 at 11:15 AM, Yaroslav Halchenko
> <lists@onerussian.com> wrote:
>> Probably you are right Josef -- especially since I am only distantly familiar
>> with KS test -- but lets keep the dialog open a bit longer ;) :
>>
>>> But what's the point in fitting ksone?
>>
>> for me it was just that it has .fit() ;)    You might recall (I believe I
>> appeared on the list long ago with similar whining and that is how we got
>> introduced to each other) our evil/silly function in PyMVPA
>> match_distributions which simply tries to choose the best matching distribution
>> given the data -- that is the reason how ksone got involved
>
> I remember and if I remember correctly, then I recommended using a
> blacklist of distributions to avoid.
>
> The last time I looked at the source of pymvpa, you used all
> distribution in the fit and then reported the best fitting ones. At
> the bottom of this ranking there should be some distributions that
> will (almost) never be a good match because fit doesn't work for them.
> The only time you see how bad they are is in extreme cases like going
> off to neverland.
>
>>
>>> > if starting values are the most sensible -- then yeap -- them ;)
>>> > if I ask to 'fit' something, getting some fit is better than getting no
>>> > fit (as NaNs in output suggest)
>>
>>> getting the starting values back doesn't mean that you have "some" fit.
>>
>>> If my brief playing with it today is correct, then the starting values
>>> don't make sense, for example you have points outside of the support
>>> of the distribution with estimated parameters (if you have negative
>>> values in the sample)
>>
>>> NaN would be better, then at least you know it doesn't make sense.
>>
>> 1. to me the big question became: what ARE the logical values here?
>
> if you look at my second message above, you see some examples, where
> fit returns numbers.
> I didn't check how good they are.
>
>>
>> followed docstring/example on
>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ksone.html
>> -- got NaNs
>>
>> then given that
>>
>> In [44]: ksone.a, ksone.b
>> Out[44]: (0.0, inf)
>>
>> I still failed to get any sensible fit() for positive values or even for
>> its own creation, e.g.
>>
>> ss.ksone.fit(ss.ksone(5).rvs(size=100))
>
>>>> rv = stats.ksone(50).rvs(size=1000)
>>>> plt.hist(rv, bins=30, normed=True, cumulative=True)
>>>> x = np.linspace(0, rv.max(), 1000); plt.plot(x, stats.ksone(50).cdf(x))
>>>> plt.show()
>
>>>> stats.ksone.fit(rv, 100, loc=-0.01, scale=1)
>
> (181.94347728444751, -3.8554246919087482e-05, 1.9277121337713585)
>>>> stats.ksone.fit(rv, 10, loc=-0.01, scale=1)
> (13.999896396912176, -0.010783712808254388, 0.57818285700694405)
>
>>
>> results in bulk of warnings and then (1.0, nan, nan).
>>
>> Looking in detail -- rvs is happily generating NaNs (especially for small n's).
>>
>> b. Also the range of sensible values of the parameter n isn't specified
>> anywhere for KS test newbies like me, which I guess adds the confusion:
>>
>>> support of the sample would help. I have no idea about good starting
>>> values for the shape parameter (n is sample size for kstest)
>>
>> aga -- so the 'demo' value of 0.9 indeed makes no sense ;)  Might be
>> worth adjusting somehow?
>>
>> 2.
>>
>> BTW -- trying to familiarize myself with the distribution plotted its
>> pdf, e.g.:
>>
>> x = np.linspace(0, 3, 1000); plt.plot(x, ksone(10).pdf(x))
>>
>> and it looks weirdish: http://www.onerussian.com/tmp/ksone-ns.png in that it is
>> not smooth and my algebra-forgotten eyes do not see obvious points with
>> no 2nd derivative of cdf given on
>> http://en.wikipedia.org/wiki/Kolmogorov_Smirnov
>
> IIRC (no time to check again right now):
> ksone is, I think, a small sample distribution,
> kstwobign is the distribution of the max/sup of a Brownian Bridge,
> which is the asymptotic distribution for Kolmogorov-Smirnov

I needed to check the source: the c source says smirnov is the
distribution for one-sided test, and I had forgotten that I had added
the one-sided option to kstest.
algorithm also used by R: " The formula of Birnbaum & Tingey (1951) is
used for the one-sample one-sided case."
http://www.jstor.org/stable/2236929

ks_2samp is still missing the one-sided options

Josef
anderson darling is most of the time more powerful than KS
https://github.com/aarchiba/kuiper no license

>
> as distribution we are mainly interested in cdf and ppf (both look
> reasonably good in  a plot), and mainly in the right tail
> ksone looks like a piecewise approximation, where they didn't care
> much about the lower part.
>
> (I'm a bit rushed right now so there might be parts missing in my reply)
>
> Josef
>
>>
>> Also why ksone.b is inf -- shouldn't it be 1?
>>
>> --
>> =------------------------------------------------------------------=
>> Keep in touch                                     www.onerussian.com
>> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev


More information about the SciPy-Dev mailing list