[SciPy-Dev] studentized range approximations
Roger Lew
rogerlew@gmail....
Thu Jun 2 13:15:52 CDT 2011
Hi Josef,
Thanks for the feedback. The "choice" of using cdf is more a carryover from
how the algorithm is described by Gleason. Perhaps it would be best to have
it match your intuition and accept the survival function?
Feel free to treat it like your own for statsmodels. I will definitely check
out some of your multcomp module so I'm not reinventing the wheel.
In the grand scheme, I could see these having a home in scipy.special (after
more extensive review of course). That is where I went to look for it when I
didn't find it in distributions.
Roger
On Thu, Jun 2, 2011 at 1:38 AM, <josef.pktd@gmail.com> wrote:
> On Thu, Jun 2, 2011 at 12:53 AM, Roger Lew <rogerlew@gmail.com> wrote:
> > Hi,
> > I have implemented some approximations for studentized range quantiles
> and
> > probabilities based on John R. Gleason's (1999) "An accurate,
> non-iterative
> > approximation for studentized range quantiles." Computational Statistics
> &
> > Data Analysis, (31), 147-158.
> > Probability approximations rely on scipy.optimize.fminbound. The
> functions
> > accept both scalars or array-like data thanks to numpy.vectorize. A fair
> > amount of validation and testing has been conducted on the code. More
> > details can be found here: http://code.google.com/p/qsturng-py/
> > I welcome any thoughts as to whether you all think this might be useful
> to
> > add to SciPy or make into a scikit. Any general comments would be helpful
> as
> > well. I should mention I'm a cognitive neuroscientist by trade, my use of
> > statistical jargon probably isn't that good.
>
> Hi Roger,
>
> I'm very interested in using this in scikits.statsmodels. The table
> that I am currently using is very limited
>
> http://statsmodels.sourceforge.net/devel/generated/scikits.statsmodels.sandbox.stats.multicomp.get_tukeyQcrit.html#scikits.statsmodels.sandbox.stats.multicomp.get_tukeyQcrit
>
> >From a quick look it looks very good.
> What I found a bit confusing is that qstrung takes the probability of
> the cdf and not of the survival function. Without reading the
> docstring carefully enough, I interpreted it as a p-value (upper tail)
> especially since pstrung returns the upper tail probability,
>
> >>> import scikits.statsmodels.sandbox.stats.multicomp as smmc
> >>> for i in range(3, 10):
> x = qsturng(0.95, i, 16)
> x, psturng(x, i, 16), smmc.get_tukeyQcrit(i, 16, 0.05),
> smmc.tukey_pvalues(x*np.ones(i), 16)[0]
>
>
> (3.647864471854692, 0.049999670839029453, array(3.6499999999999999),
> 0.050092818925981608)
> (4.0464124382823847, 0.050001178443752514, array(4.0499999999999998),
> 0.037164602483501508)
> (4.3332505094058114, 0.049999838126148499, array(4.3300000000000001),
> 0.029954033157223781)
> (4.5573603020371234, 0.049999276281813887, array(4.5599999999999996),
> 0.025276987281047769)
> (4.7410585998112742, 0.049998508166777755, array(4.7400000000000002),
> 0.022010630154416622)
> (4.8965400268915289, 0.04999983345598491, array(4.9000000000000004),
> 0.019614841752159107)
> (5.0312039650945257, 0.049999535359310343, array(5.0300000000000002),
> 0.017721848279719898)
>
> The last column is (in my interpretation) supposed to be 0.05. I was
> trying to get the pvalues for Tukeys range statistic through the
> multivariate t-distribution, but the unit test looks only at one point
> (and I ran out of time to work on this during Christmas break). Either
> there is a bug (it's still in the sandbox) or my interpretation is
> wrong.
>
> The advantage of the multivariate t-distribution is that it allows for
> arbitrary correlation, but it's not a substitute for pre-calculated
> tables for standard cases/distributions because it's much too slow.
>
> ------------
> As a bit of background on the multiple testing, multiple comparison
> status in statsmodels:
>
> The tukeyhsd test has one test case against R, but it has too many
> options (it allows unequal variances and unequal sample sizes, that
> still need to be checked.)
>
>
> http://statsmodels.sourceforge.net/devel/generated/scikits.statsmodels.sandbox.stats.multicomp.tukeyhsd.html#scikits.statsmodels.sandbox.stats.multicomp.tukeyhsd
>
> What I did manage to finish and verify against R
>
>
> http://statsmodels.sourceforge.net/devel/generated/scikits.statsmodels.sandbox.stats.multicomp.multipletests.html#scikits.statsmodels.sandbox.stats.multicomp.multipletests
>
> multiple testing for general linear models is very incomplete
>
> and as an aside: I'm not a statistician, and if the module in the
> statsmodels sandbox is still a mess then it's because I took me a long
> time and many functions to figure out what's going on.
> ----------
>
> scipy.special has a nice collection of standard distributions
> functions, but it would be very useful to have some additional
> distributions either in scipy or scikits.statsmodels available, like
> your studentized range statistic, (and maybe some others in multiple
> comparisons, like Duncan, Dunnet) and Anderson-Darling, and ...
>
> Thanks,
>
> Josef
>
>
> > Regards,
> > Roger
> > Roger Lew
> >
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev@scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20110602/ed3d1d29/attachment.html
More information about the SciPy-Dev
mailing list