[SciPy-Dev] studentized range approximations

josef.pktd@gmai... josef.pktd@gmai...
Wed Mar 28 13:58:56 CDT 2012


On Thu, Jun 2, 2011 at 2:15 PM, Roger Lew <rogerlew@gmail.com> wrote:
> Hi Josef,
>
> Thanks for the feedback. The "choice" of using cdf is more a carryover from
> how the algorithm is described by Gleason. Perhaps it would be best to have
> it match your intuition and accept the survival function?
>
> Feel free to treat it like your own for statsmodels. I will definitely check
> out some of your multcomp module so I'm not reinventing the wheel.
>
> In the grand scheme, I could see these having a home in scipy.special (after
> more extensive review of course). That is where I went to look for it when I
> didn't find it in distributions.

Hi Roger,

I'm just on the way of including qstrung-py in statsmodels.

I needed to add an OrderedDict for python < 2.7 but everything looks
as good as in my first impression.

Thank you for making this available,

Josef

>
> Roger
>
>
> On Thu, Jun 2, 2011 at 1:38 AM, <josef.pktd@gmail.com> wrote:
>>
>> On Thu, Jun 2, 2011 at 12:53 AM, Roger Lew <rogerlew@gmail.com> wrote:
>> > Hi,
>> > I have implemented some approximations for studentized range quantiles
>> > and
>> > probabilities based on John R. Gleason's (1999) "An accurate,
>> > non-iterative
>> > approximation for studentized range quantiles." Computational Statistics
>> > &
>> > Data Analysis, (31), 147-158.
>> > Probability approximations rely on scipy.optimize.fminbound. The
>> > functions
>> > accept both scalars or array-like data thanks to numpy.vectorize. A fair
>> > amount of validation and testing has been conducted on the code. More
>> > details can be found here: http://code.google.com/p/qsturng-py/
>> > I welcome any thoughts as to whether you all think this might be useful
>> > to
>> > add to SciPy or make into a scikit. Any general comments would be
>> > helpful as
>> > well. I should mention I'm a cognitive neuroscientist by trade, my use
>> > of
>> > statistical jargon probably isn't that good.
>>
>> Hi Roger,
>>
>> I'm very interested in using this in scikits.statsmodels. The table
>> that I am currently using is very limited
>>
>> http://statsmodels.sourceforge.net/devel/generated/scikits.statsmodels.sandbox.stats.multicomp.get_tukeyQcrit.html#scikits.statsmodels.sandbox.stats.multicomp.get_tukeyQcrit
>>
>> >From a quick look it looks very good.
>> What I found a bit confusing is that qstrung takes the probability of
>> the cdf and not of the survival function. Without reading the
>> docstring carefully enough, I interpreted it as a p-value (upper tail)
>> especially since pstrung returns the upper tail probability,
>>
>> >>> import scikits.statsmodels.sandbox.stats.multicomp as smmc
>> >>> for i in range(3, 10):
>>        x = qsturng(0.95, i, 16)
>>        x, psturng(x, i, 16), smmc.get_tukeyQcrit(i, 16, 0.05),
>> smmc.tukey_pvalues(x*np.ones(i), 16)[0]
>>
>>
>> (3.647864471854692, 0.049999670839029453, array(3.6499999999999999),
>> 0.050092818925981608)
>> (4.0464124382823847, 0.050001178443752514, array(4.0499999999999998),
>> 0.037164602483501508)
>> (4.3332505094058114, 0.049999838126148499, array(4.3300000000000001),
>> 0.029954033157223781)
>> (4.5573603020371234, 0.049999276281813887, array(4.5599999999999996),
>> 0.025276987281047769)
>> (4.7410585998112742, 0.049998508166777755, array(4.7400000000000002),
>> 0.022010630154416622)
>> (4.8965400268915289, 0.04999983345598491, array(4.9000000000000004),
>> 0.019614841752159107)
>> (5.0312039650945257, 0.049999535359310343, array(5.0300000000000002),
>> 0.017721848279719898)
>>
>> The last column is (in my interpretation) supposed to be 0.05. I was
>> trying to get the pvalues for Tukeys range statistic through the
>> multivariate t-distribution, but the unit test looks only at one point
>> (and I ran out of time to work on this during Christmas break). Either
>> there is a bug (it's still in the sandbox) or my interpretation is
>> wrong.
>>
>> The advantage of the multivariate t-distribution is that it allows for
>> arbitrary correlation, but it's not a substitute for pre-calculated
>> tables for standard cases/distributions because it's much too slow.
>>
>> ------------
>> As a bit of background on the multiple testing, multiple comparison
>> status in statsmodels:
>>
>> The tukeyhsd test has one test case against R, but it has too many
>> options (it allows unequal variances and unequal sample sizes, that
>> still need to be checked.)
>>
>>
>> http://statsmodels.sourceforge.net/devel/generated/scikits.statsmodels.sandbox.stats.multicomp.tukeyhsd.html#scikits.statsmodels.sandbox.stats.multicomp.tukeyhsd
>>
>> What I did manage to finish and verify against R
>>
>>
>> http://statsmodels.sourceforge.net/devel/generated/scikits.statsmodels.sandbox.stats.multicomp.multipletests.html#scikits.statsmodels.sandbox.stats.multicomp.multipletests
>>
>> multiple testing for general linear models is very incomplete
>>
>> and as an aside: I'm not a statistician, and if the module in the
>> statsmodels sandbox is still a mess then it's because I took me a long
>> time and many functions to figure out what's going on.
>> ----------
>>
>> scipy.special has a nice collection of standard distributions
>> functions, but it would be very useful to have some additional
>> distributions either in scipy or scikits.statsmodels available, like
>> your studentized range statistic, (and maybe some others in multiple
>> comparisons, like Duncan, Dunnet) and Anderson-Darling, and ...
>>
>> Thanks,
>>
>> Josef
>>
>>
>> > Regards,
>> > Roger
>> > Roger Lew
>> >
>> > _______________________________________________
>> > SciPy-Dev mailing list
>> > SciPy-Dev@scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >
>> >
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the SciPy-Dev mailing list