[SciPy-User] ancova with optimize.curve_fit

Skipper Seabold jsseabold@gmail....
Thu Dec 2 18:57:38 CST 2010


On Thu, Dec 2, 2010 at 7:11 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
> On Thu, Dec 2, 2010 at 5:59 PM, Peter Tittmann <ptittmann@gmail.com> wrote:
>> getDiam is a predictor to get dbh from height. It works with curve_fit to
>> find coefficients a and b given datasetset of known dbh/height pairs. You
>> are right, what I want is dummy variables for each plot. I'll see if I can
>> get that worked out by revising getDiam..
>> Thanks again
>>
>
> I think it would be easier to create your dummy variables before you pass it in.
>
> You might find some of the tools in statsmodels to be helpful here.
> We don't yet have an ANCOVA model, but you could definitely do
> something like the following.  Not sure if it's exactly what you want,
> but it should give you an idea.
>
> import numpy as np
> import scikits.statsmodels as sm
>
> dta = np.genfromtxt('./db_out.csv', delimiter=",", names=True, dtype=None)
> plot_dummies, col_map = sm.tools.categorical(dta['plot'], drop=True,
> dictnames=True)
>
> plot_dummies will be dummy variables for all of the "plot" categories,
> and col_map is a map from the column number to the plot just so you
> can be sure you know what's what.
>
> I don't see how to use your objective function though with dummy
> variables.  What happens if the effect of one of the plots is
> negative, then you run into 0 ** -1.5 == inf.
>

If you want to do NLLS and not linearize then something like this
might work and still keep the dummy variables as shift parameters

def getDiam(ht, *b):
    return ht[:,0]**b[0] + np.sum(b[1:]*ht[:,1:], axis=1)

X = np.column_stack((indHtPlot, plot_dummies))
Y = depDbh
coefs, cov = optimize.curve_fit(getDiam, X, Y, p0= [0.]*X.shape[1])


> You could linearize your objective function to be
>
> b*ln(ht)
>
> and do something like
>
> indHtPlot = dta['height']
> depDbh = dta['dbh']
> X = np.column_stack((np.log(indHtPlot), plot_dummies))
> Y = np.log(depDbh)
> res = sm.OLS(Y,X).fit()
> res.params
> array([ 0.98933264, -1.35239293, -1.0623305 , -0.99155293, -1.33675099,
>       -1.30657011, -1.50933751, -1.28744779, -1.43937358, -1.33805883,
>       -1.32744257, -1.42672539, -1.35239293, -1.60585046, -1.45239093,
>       -1.45695112, -1.34811186, -1.32658794, -1.21721715, -1.32853084,
>       -1.45775017, -1.44460388, -2.19065236, -1.3303631 , -1.20509831,
>       -1.37341535, -1.25746105, -1.33954972, -1.33922709, -1.247304  ])
>
> Note that your coefficient on height is now an elasticity.  I'm sure
> I'm missing something here, but that might help you along the way.
>
> Skipper
>


More information about the SciPy-User mailing list