[SciPy-Dev] cephes_smirnov never returns on mips/sparc/...
Sat Mar 31 11:02:05 CDT 2012
On Sat, Mar 31, 2012 at 11:15 AM, Yaroslav Halchenko
> Probably you are right Josef -- especially since I am only distantly familiar
> with KS test -- but lets keep the dialog open a bit longer ;) :
>> But what's the point in fitting ksone?
> for me it was just that it has .fit() ;) You might recall (I believe I
> appeared on the list long ago with similar whining and that is how we got
> introduced to each other) our evil/silly function in PyMVPA
> match_distributions which simply tries to choose the best matching distribution
> given the data -- that is the reason how ksone got involved
I remember and if I remember correctly, then I recommended using a
blacklist of distributions to avoid.
The last time I looked at the source of pymvpa, you used all
distribution in the fit and then reported the best fitting ones. At
the bottom of this ranking there should be some distributions that
will (almost) never be a good match because fit doesn't work for them.
The only time you see how bad they are is in extreme cases like going
off to neverland.
>> > if starting values are the most sensible -- then yeap -- them ;)
>> > if I ask to 'fit' something, getting some fit is better than getting no
>> > fit (as NaNs in output suggest)
>> getting the starting values back doesn't mean that you have "some" fit.
>> If my brief playing with it today is correct, then the starting values
>> don't make sense, for example you have points outside of the support
>> of the distribution with estimated parameters (if you have negative
>> values in the sample)
>> NaN would be better, then at least you know it doesn't make sense.
> 1. to me the big question became: what ARE the logical values here?
if you look at my second message above, you see some examples, where
fit returns numbers.
I didn't check how good they are.
> followed docstring/example on
> -- got NaNs
> then given that
> In : ksone.a, ksone.b
> Out: (0.0, inf)
> I still failed to get any sensible fit() for positive values or even for
> its own creation, e.g.
>>> rv = stats.ksone(50).rvs(size=1000)
>>> plt.hist(rv, bins=30, normed=True, cumulative=True)
>>> x = np.linspace(0, rv.max(), 1000); plt.plot(x, stats.ksone(50).cdf(x))
>>> stats.ksone.fit(rv, 100, loc=-0.01, scale=1)
(181.94347728444751, -3.8554246919087482e-05, 1.9277121337713585)
>>> stats.ksone.fit(rv, 10, loc=-0.01, scale=1)
(13.999896396912176, -0.010783712808254388, 0.57818285700694405)
> results in bulk of warnings and then (1.0, nan, nan).
> Looking in detail -- rvs is happily generating NaNs (especially for small n's).
> b. Also the range of sensible values of the parameter n isn't specified
> anywhere for KS test newbies like me, which I guess adds the confusion:
>> support of the sample would help. I have no idea about good starting
>> values for the shape parameter (n is sample size for kstest)
> aga -- so the 'demo' value of 0.9 indeed makes no sense ;) Might be
> worth adjusting somehow?
> BTW -- trying to familiarize myself with the distribution plotted its
> pdf, e.g.:
> x = np.linspace(0, 3, 1000); plt.plot(x, ksone(10).pdf(x))
> and it looks weirdish: http://www.onerussian.com/tmp/ksone-ns.png in that it is
> not smooth and my algebra-forgotten eyes do not see obvious points with
> no 2nd derivative of cdf given on
IIRC (no time to check again right now):
ksone is, I think, a small sample distribution,
kstwobign is the distribution of the max/sup of a Brownian Bridge,
which is the asymptotic distribution for Kolmogorov-Smirnov
as distribution we are mainly interested in cdf and ppf (both look
reasonably good in a plot), and mainly in the right tail
ksone looks like a piecewise approximation, where they didn't care
much about the lower part.
(I'm a bit rushed right now so there might be parts missing in my reply)
> Also why ksone.b is inf -- shouldn't it be 1?
> Keep in touch www.onerussian.com
> Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic
> SciPy-Dev mailing list
More information about the SciPy-Dev