[SciPy-Dev] studentized range approximations
Wed Mar 28 13:58:56 CDT 2012
On Thu, Jun 2, 2011 at 2:15 PM, Roger Lew <firstname.lastname@example.org> wrote:
> Hi Josef,
> Thanks for the feedback. The "choice" of using cdf is more a carryover from
> how the algorithm is described by Gleason. Perhaps it would be best to have
> it match your intuition and accept the survival function?
> Feel free to treat it like your own for statsmodels. I will definitely check
> out some of your multcomp module so I'm not reinventing the wheel.
> In the grand scheme, I could see these having a home in scipy.special (after
> more extensive review of course). That is where I went to look for it when I
> didn't find it in distributions.
I'm just on the way of including qstrung-py in statsmodels.
I needed to add an OrderedDict for python < 2.7 but everything looks
as good as in my first impression.
Thank you for making this available,
> On Thu, Jun 2, 2011 at 1:38 AM, <email@example.com> wrote:
>> On Thu, Jun 2, 2011 at 12:53 AM, Roger Lew <firstname.lastname@example.org> wrote:
>> > Hi,
>> > I have implemented some approximations for studentized range quantiles
>> > and
>> > probabilities based on John R. Gleason's (1999) "An accurate,
>> > non-iterative
>> > approximation for studentized range quantiles." Computational Statistics
>> > &
>> > Data Analysis, (31), 147-158.
>> > Probability approximations rely on scipy.optimize.fminbound. The
>> > functions
>> > accept both scalars or array-like data thanks to numpy.vectorize. A fair
>> > amount of validation and testing has been conducted on the code. More
>> > details can be found here: http://code.google.com/p/qsturng-py/
>> > I welcome any thoughts as to whether you all think this might be useful
>> > to
>> > add to SciPy or make into a scikit. Any general comments would be
>> > helpful as
>> > well. I should mention I'm a cognitive neuroscientist by trade, my use
>> > of
>> > statistical jargon probably isn't that good.
>> Hi Roger,
>> I'm very interested in using this in scikits.statsmodels. The table
>> that I am currently using is very limited
>> >From a quick look it looks very good.
>> What I found a bit confusing is that qstrung takes the probability of
>> the cdf and not of the survival function. Without reading the
>> docstring carefully enough, I interpreted it as a p-value (upper tail)
>> especially since pstrung returns the upper tail probability,
>> >>> import scikits.statsmodels.sandbox.stats.multicomp as smmc
>> >>> for i in range(3, 10):
>> x = qsturng(0.95, i, 16)
>> x, psturng(x, i, 16), smmc.get_tukeyQcrit(i, 16, 0.05),
>> smmc.tukey_pvalues(x*np.ones(i), 16)
>> (3.647864471854692, 0.049999670839029453, array(3.6499999999999999),
>> (4.0464124382823847, 0.050001178443752514, array(4.0499999999999998),
>> (4.3332505094058114, 0.049999838126148499, array(4.3300000000000001),
>> (4.5573603020371234, 0.049999276281813887, array(4.5599999999999996),
>> (4.7410585998112742, 0.049998508166777755, array(4.7400000000000002),
>> (4.8965400268915289, 0.04999983345598491, array(4.9000000000000004),
>> (5.0312039650945257, 0.049999535359310343, array(5.0300000000000002),
>> The last column is (in my interpretation) supposed to be 0.05. I was
>> trying to get the pvalues for Tukeys range statistic through the
>> multivariate t-distribution, but the unit test looks only at one point
>> (and I ran out of time to work on this during Christmas break). Either
>> there is a bug (it's still in the sandbox) or my interpretation is
>> The advantage of the multivariate t-distribution is that it allows for
>> arbitrary correlation, but it's not a substitute for pre-calculated
>> tables for standard cases/distributions because it's much too slow.
>> As a bit of background on the multiple testing, multiple comparison
>> status in statsmodels:
>> The tukeyhsd test has one test case against R, but it has too many
>> options (it allows unequal variances and unequal sample sizes, that
>> still need to be checked.)
>> What I did manage to finish and verify against R
>> multiple testing for general linear models is very incomplete
>> and as an aside: I'm not a statistician, and if the module in the
>> statsmodels sandbox is still a mess then it's because I took me a long
>> time and many functions to figure out what's going on.
>> scipy.special has a nice collection of standard distributions
>> functions, but it would be very useful to have some additional
>> distributions either in scipy or scikits.statsmodels available, like
>> your studentized range statistic, (and maybe some others in multiple
>> comparisons, like Duncan, Dunnet) and Anderson-Darling, and ...
>> > Regards,
>> > Roger
>> > Roger Lew
>> > _______________________________________________
>> > SciPy-Dev mailing list
>> > SciPy-Dev@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-dev
>> SciPy-Dev mailing list
> SciPy-Dev mailing list
More information about the SciPy-Dev