[SciPy-User] ancova with optimize.curve_fit
Thu Dec 2 19:03:53 CST 2010
On Thu, Dec 2, 2010 at 7:57 PM, Skipper Seabold <email@example.com> wrote:
> On Thu, Dec 2, 2010 at 7:11 PM, Skipper Seabold <firstname.lastname@example.org> wrote:
>> On Thu, Dec 2, 2010 at 5:59 PM, Peter Tittmann <email@example.com> wrote:
>>> getDiam is a predictor to get dbh from height. It works with curve_fit to
>>> find coefficients a and b given datasetset of known dbh/height pairs. You
>>> are right, what I want is dummy variables for each plot. I'll see if I can
>>> get that worked out by revising getDiam..
>>> Thanks again
>> I think it would be easier to create your dummy variables before you pass it in.
>> You might find some of the tools in statsmodels to be helpful here.
>> We don't yet have an ANCOVA model, but you could definitely do
>> something like the following. Not sure if it's exactly what you want,
>> but it should give you an idea.
>> import numpy as np
>> import scikits.statsmodels as sm
>> dta = np.genfromtxt('./db_out.csv', delimiter=",", names=True, dtype=None)
>> plot_dummies, col_map = sm.tools.categorical(dta['plot'], drop=True,
>> plot_dummies will be dummy variables for all of the "plot" categories,
>> and col_map is a map from the column number to the plot just so you
>> can be sure you know what's what.
>> I don't see how to use your objective function though with dummy
>> variables. What happens if the effect of one of the plots is
>> negative, then you run into 0 ** -1.5 == inf.
> If you want to do NLLS and not linearize then something like this
> might work and still keep the dummy variables as shift parameters
> def getDiam(ht, *b):
> return ht[:,0]**b + np.sum(b[1:]*ht[:,1:], axis=1)
> X = np.column_stack((indHtPlot, plot_dummies))
> Y = depDbh
> coefs, cov = optimize.curve_fit(getDiam, X, Y, p0= [0.]*X.shape)
In the sample file there are 11 levels of the `plot` that have only a
single observation each. I tried to use onewaygls, but statsmodels.OLS
doesn't work if y is a scalar.
I don't know whether curvefit or optimize.leastsq will converge in
this case, good starting values might be necessary.
>> You could linearize your objective function to be
>> and do something like
>> indHtPlot = dta['height']
>> depDbh = dta['dbh']
>> X = np.column_stack((np.log(indHtPlot), plot_dummies))
>> Y = np.log(depDbh)
>> res = sm.OLS(Y,X).fit()
>> array([ 0.98933264, -1.35239293, -1.0623305 , -0.99155293, -1.33675099,
>> -1.30657011, -1.50933751, -1.28744779, -1.43937358, -1.33805883,
>> -1.32744257, -1.42672539, -1.35239293, -1.60585046, -1.45239093,
>> -1.45695112, -1.34811186, -1.32658794, -1.21721715, -1.32853084,
>> -1.45775017, -1.44460388, -2.19065236, -1.3303631 , -1.20509831,
>> -1.37341535, -1.25746105, -1.33954972, -1.33922709, -1.247304 ])
>> Note that your coefficient on height is now an elasticity. I'm sure
>> I'm missing something here, but that might help you along the way.
> SciPy-User mailing list
More information about the SciPy-User