[SciPy-Dev] studentized range approximations
Fri Jun 3 08:37:32 CDT 2011
On 06/02/2011 01:15 PM, Roger Lew wrote:
> Hi Josef,
> Thanks for the feedback. The "choice" of using cdf is more a carryover
> from how the algorithm is described by Gleason. Perhaps it would be
> best to have it match your intuition and accept the survival function?
> Feel free to treat it like your own for statsmodels. I will definitely
> check out some of your multcomp module so I'm not reinventing the wheel.
> In the grand scheme, I could see these having a home in scipy.special
> (after more extensive review of course). That is where I went to look
> for it when I didn't find it in distributions.
> On Thu, Jun 2, 2011 at 1:38 AM, <email@example.com
> <mailto:firstname.lastname@example.org>> wrote:
> On Thu, Jun 2, 2011 at 12:53 AM, Roger Lew <email@example.com
> <mailto:firstname.lastname@example.org>> wrote:
> > Hi,
> > I have implemented some approximations for studentized range
> quantiles and
> > probabilities based on John R. Gleason's (1999) "An accurate,
> > approximation for studentized range quantiles." Computational
> Statistics &
> > Data Analysis, (31), 147-158.
> > Probability approximations rely on scipy.optimize.fminbound. The
> > accept both scalars or array-like data thanks to
> numpy.vectorize. A fair
> > amount of validation and testing has been conducted on the code.
> > details can be found here: http://code.google.com/p/qsturng-py/
> > I welcome any thoughts as to whether you all think this might be
> useful to
> > add to SciPy or make into a scikit. Any general comments would
> be helpful as
> > well. I should mention I'm a cognitive neuroscientist by trade,
> my use of
> > statistical jargon probably isn't that good.
> Hi Roger,
> I'm very interested in using this in scikits.statsmodels. The table
> that I am currently using is very limited
> >From a quick look it looks very good.
> What I found a bit confusing is that qstrung takes the probability of
> the cdf and not of the survival function. Without reading the
> docstring carefully enough, I interpreted it as a p-value (upper tail)
> especially since pstrung returns the upper tail probability,
> >>> import scikits.statsmodels.sandbox.stats.multicomp as smmc
> >>> for i in range(3, 10):
> x = qsturng(0.95, i, 16)
> x, psturng(x, i, 16), smmc.get_tukeyQcrit(i, 16, 0.05),
> smmc.tukey_pvalues(x*np.ones(i), 16)
> (3.647864471854692, 0.049999670839029453, array(3.6499999999999999),
> (4.0464124382823847, 0.050001178443752514, array(4.0499999999999998),
> (4.3332505094058114, 0.049999838126148499, array(4.3300000000000001),
> (4.5573603020371234, 0.049999276281813887, array(4.5599999999999996),
> (4.7410585998112742, 0.049998508166777755, array(4.7400000000000002),
> (4.8965400268915289, 0.04999983345598491, array(4.9000000000000004),
> (5.0312039650945257, 0.049999535359310343, array(5.0300000000000002),
> The last column is (in my interpretation) supposed to be 0.05. I was
> trying to get the pvalues for Tukeys range statistic through the
> multivariate t-distribution, but the unit test looks only at one point
> (and I ran out of time to work on this during Christmas break). Either
> there is a bug (it's still in the sandbox) or my interpretation is
> The advantage of the multivariate t-distribution is that it allows for
> arbitrary correlation, but it's not a substitute for pre-calculated
> tables for standard cases/distributions because it's much too slow.
> As a bit of background on the multiple testing, multiple comparison
> status in statsmodels:
> The tukeyhsd test has one test case against R, but it has too many
> options (it allows unequal variances and unequal sample sizes, that
> still need to be checked.)
> What I did manage to finish and verify against R
> multiple testing for general linear models is very incomplete
> and as an aside: I'm not a statistician, and if the module in the
> statsmodels sandbox is still a mess then it's because I took me a long
> time and many functions to figure out what's going on.
> scipy.special has a nice collection of standard distributions
> functions, but it would be very useful to have some additional
> distributions either in scipy or scikits.statsmodels available, like
> your studentized range statistic, (and maybe some others in multiple
> comparisons, like Duncan, Dunnet) and Anderson-Darling, and ...
> > Regards,
> > Roger
> > Roger Lew
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev@scipy.org <mailto:SciPy-Dev@scipy.org>
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org <mailto:SciPy-Dev@scipy.org>
> SciPy-Dev mailing list
The problem I have is that this is still an approximation that probably
covers what most situations.
Do you have the Copenhaver & Holland (1988) approach or know any
BSD-licensed Python code for it?
Also, the code probably needs to conform to numpy/scipy standards (which
I don't remember the links).
You also have vary less desirable features:
1) hard coded numbers like '1e38' - numpy does define infinity (np.inf).
2) comparisons to 'inf' ('v==inf') that are not desirable.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-Dev