[SciPy-User] ancova with optimize.curve_fit

josef.pktd@gmai... josef.pktd@gmai...
Thu Dec 2 19:03:53 CST 2010


On Thu, Dec 2, 2010 at 7:57 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
> On Thu, Dec 2, 2010 at 7:11 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
>> On Thu, Dec 2, 2010 at 5:59 PM, Peter Tittmann <ptittmann@gmail.com> wrote:
>>> getDiam is a predictor to get dbh from height. It works with curve_fit to
>>> find coefficients a and b given datasetset of known dbh/height pairs. You
>>> are right, what I want is dummy variables for each plot. I'll see if I can
>>> get that worked out by revising getDiam..
>>> Thanks again
>>>
>>
>> I think it would be easier to create your dummy variables before you pass it in.
>>
>> You might find some of the tools in statsmodels to be helpful here.
>> We don't yet have an ANCOVA model, but you could definitely do
>> something like the following.  Not sure if it's exactly what you want,
>> but it should give you an idea.
>>
>> import numpy as np
>> import scikits.statsmodels as sm
>>
>> dta = np.genfromtxt('./db_out.csv', delimiter=",", names=True, dtype=None)
>> plot_dummies, col_map = sm.tools.categorical(dta['plot'], drop=True,
>> dictnames=True)
>>
>> plot_dummies will be dummy variables for all of the "plot" categories,
>> and col_map is a map from the column number to the plot just so you
>> can be sure you know what's what.
>>
>> I don't see how to use your objective function though with dummy
>> variables.  What happens if the effect of one of the plots is
>> negative, then you run into 0 ** -1.5 == inf.
>>
>
> If you want to do NLLS and not linearize then something like this
> might work and still keep the dummy variables as shift parameters
>
> def getDiam(ht, *b):
>    return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1)
>
> X = np.column_stack((indHtPlot, plot_dummies))
> Y = depDbh
> coefs, cov = optimize.curve_fit(getDiam, X, Y, p0= [0.]*X.shape[1])

In the sample file there are 11 levels of the `plot` that have only a
single observation each. I tried to use onewaygls, but statsmodels.OLS
doesn't work if y is a scalar.

I don't know whether curvefit or optimize.leastsq will converge in
this case, good starting values might be necessary.

Josef


>
>
>> You could linearize your objective function to be
>>
>> b*ln(ht)
>>
>> and do something like
>>
>> indHtPlot = dta['height']
>> depDbh = dta['dbh']
>> X = np.column_stack((np.log(indHtPlot), plot_dummies))
>> Y = np.log(depDbh)
>> res = sm.OLS(Y,X).fit()
>> res.params
>> array([ 0.98933264, -1.35239293, -1.0623305 , -0.99155293, -1.33675099,
>>       -1.30657011, -1.50933751, -1.28744779, -1.43937358, -1.33805883,
>>       -1.32744257, -1.42672539, -1.35239293, -1.60585046, -1.45239093,
>>       -1.45695112, -1.34811186, -1.32658794, -1.21721715, -1.32853084,
>>       -1.45775017, -1.44460388, -2.19065236, -1.3303631 , -1.20509831,
>>       -1.37341535, -1.25746105, -1.33954972, -1.33922709, -1.247304  ])
>>
>> Note that your coefficient on height is now an elasticity.  I'm sure
>> I'm missing something here, but that might help you along the way.
>>
>> Skipper
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list