# [Nipy-devel] rm 2-way ANOVA

josef.pktd@gmai... josef.pktd@gmai...
Mon Nov 16 21:14:38 CST 2009

```On Mon, Nov 16, 2009 at 8:45 PM, Christian Brodbeck
<christianmbrodbeck@gmail.com> wrote:
>
> I'd be interested to see how you went about doing this, if you think
>
> it is general enough to contribute to the project.  Did you primarily
>
> use Josef's code?
>
> I wrote my own code, it depends very much on what your source data looks
>
> like and how your factors are coded (as Josef said). It is fairly simple in
>
> principle though and could be adapted. If in your data array subjects are
>
> alternating inside blocks, the fixed effect of subject can be added to the
>
> design matrix as:
>
> (corrected version):
>
>     _sub___   _conditions...
> s1  1  0  0    1 ... \
> s2  0  1  0    1      \ condition 1
> s3  0  0  1    1      /
> s4 -1 -1 -1    1     /
> s1  1  0  0    0     \
> s2  0  1  0    0      \ condition 2
> ...
>
> I don't think I understand the meaning of the -1 in s4.
>
> I think it's merely a matter of how you set up contrasts between subjects:
> with -1s predictor variables contrast subjects against the last subject,
> whereas with 0 the mean for the last subject would be predicted by the
> intercept whereas the other s have their parameter. I tried substituting 0s
> for the -1s and got exactly the same results for my factors condition.

I still find ANOVA very confusing, except for the simple cases. I think I
see partially how this works, but my intuition is for manipulating the
columns (dummy variables) and not so much for changing observations
(rows).

>
> If I understand this correctly than you have to loop 129 times ?
>
> I am performing the ANOVA for each electrode for each sample point, so
> 129*150=19350 times

Ok, that's a lot and some speedup would be good.
Just out of curiosity, is there still a way to interpret the results of so many
statistical estimations or tests?

>
>
> Yes, this is right.  Using a trunk of scipy for block_diag you have
>
> import scikits.statsmodels as sm
>
> import numpy as np
>
> from scipy import linalg
>
> data1 = sm.datasets.longley.Load()
>
> data2 = sm.datasets.longley.Load()
>
> data1.exog = sm.add_constant(data1.exog)
>
> data2.exog = sm.add_constant(data2.exog)
>
> exog = linalg.block_diag(data1.exog, data2.exog)
>
> endog = np.vstack((data1.endog[:,None],data1.endog[:,None]))
>
> res = sm.OLS(endog,exog).fit()
>
> In [17]: res.params
>
> Out[17]:
>
> array([  1.50618723e+01,  -3.58191793e-02,  -2.02022980e+00,
>
>        -1.03322687e+00,  -5.11041057e-02,   1.82915146e+03,
>
>        -3.48225863e+06,   1.50618723e+01,  -3.58191793e-02,
>
>        -2.02022980e+00,  -1.03322687e+00,  -5.11041057e-02,
>
>         1.82915146e+03,  -3.48225863e+06])
>
> res2 = sm.OLS(data1.endog, data1.exog).fit()
>
> In [19]: res2.params
>
> Out[19]:
>
> array([  1.50618723e+01,  -3.58191793e-02,  -2.02022980e+00,
>
>        -1.03322687e+00,  -5.11041057e-02,   1.82915146e+03,
>
>        -3.48225863e+06])
>
> So if you are using OLS to do the anova, then you can use the block
>
> diagonal.  You should probably use sparse matrices though if your
>
> design is very big.  There is an example of building a block diagonal
>
> matrix in our sandbox (haven't had time to clean this up, write tests,
>
> though the results are correct via the 'eyeball' test, or get a
>
> polished results class yet).  It's in the __init__ method for the SUR
>
> class.
>
>
> For the case when all design matrices are the same, you could save on
> replicating them in the block diagonal matrix.
> OLS parameter estimation is easy for this case, linalg.leastsq allows
> for matrix endog, the work is in getting all the anova and regression
> result statistics.
>
> Yes, all the design matrices are the same, so you are saying with the
> present sm.OLS I could estimate all anovas at once, using my model as exog
> and putting the different endogs into a certain matrix form?

Sorry for the ambiguity, I meant ordinary least squares in general and not
sm.OLS. Actually, I don't know what would happen in sm.OLS if you give
it a 2 dimensional endog, but I assume it would break with some incorrect
dimension exception.

I meant, using linalg.leastsq or (I think) linalg.pinv (which is used in sm.OLS)
do work for 2d endog, and essentially estimate the parameters for many
"parallel" regressions. However, it would be necessary to check which
result statistics would work in this case.
Sum of squares and predicted endog should work easily. The pvalues for
F-tests and t-tests should also work without problems. For these it would
be just necessary to keep track of the correct dimension in the calculations
(e.g. dot products).
The first real problem will be in the variance-covariance
matrix of the parameter estimates, since it is already 2d. The raw
covariance matrix (X.T, X)^{-1} would be the same but the estimates for
the error variance would differ for each regression/endog.
I don't know how easy it is to get vectorized F or t statistics for different
linear hypotheses (contrasts) for this case.

It might be possible to do it by rearranging the calculations, but sm.OLS
is designed for providing many result statistics for a single regression
and not for the case of several endog.

Cheers,

Josef

>
>
> Cheers,
> --Christian
> _______________________________________________
> Nipy-devel mailing list
> Nipy-devel@neuroimaging.scipy.org
> http://mail.scipy.org/mailman/listinfo/nipy-devel
>
>

```