# [SciPy-User] ancova with optimize.curve_fit

josef.pktd@gmai... josef.pktd@gmai...
Mon Dec 6 18:55:04 CST 2010

```On Mon, Dec 6, 2010 at 7:41 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
> On Mon, Dec 6, 2010 at 7:31 PM, Peter Tittmann <ptittmann@gmail.com> wrote:
>> thanks both of you,
>> Josef, the data that I sent is only the first 100 rows of about 1500, there
>> should be sufficient sampling in each plot.
>> Skipper, I have attempted to deploy your suggestion for not linearizing the
>> data. It seems to work. I'm a little confused at your modification if the
>> getDiam function and I wonder if you could help me understand. The form of
>> the equation that is being fit is:
>> Y= a*X^b
>> your version of the detDaim function:
>>
>> def getDiam(ht, *b):
>>    return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1)
>>
>> Im sorry if this is an obvious question but I don't understand how this
>> works as it seems that the "a" coefficient is missing.
>> Thanks again!
>
> Right.  I took out the 'a', because as I read it when I linearized (I
> might be misunderstanding ancova, I never recall the details), if you
> include 'a' and also all of the dummy variables for the plot, then you
> will have a the problem of multicollinearity.  You could also include
> 'a' and drop one of the plot dummies, but then 'a' is just your
> reference category that you dropped.  So now b[0] is the nonlinear
> effect of your main variable and b[1:] contains linear shift effects
> of all the plots.  Hmm, thinking about it some more, though I think
> you could include 'a' in the non-linear version above (call it b[0]
> and shift everything else over by one), because now 'a' would be the
> effect when the current b[0] is zero.  I was just unsure how you meant
> 'a' when you had a*ht**b and were trying to include in ht the plot
> variable dummies.

As I understand it, the intention is to estimate equality of the slope
coefficients, so the continuous variable is multiplied with the dummy
variables. In this case, the constant should still be added. The
normalization question is whether to include all dummy-cont.variable
products and drop the continuous variable, or include the continuous
variable and drop one of the dummy-cont levels.

Unless there is a strong reason to avoid log-normality of errors, I
would work (first) with the linear version.

Josef

>
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
```