[SciPy-User] Generalized least square on large dataset

josef.pktd@gmai... josef.pktd@gmai...
Fri Mar 9 11:51:23 CST 2012


On Fri, Mar 9, 2012 at 12:43 PM, Peter Cimermančič
<peter.cimermancic@gmail.com> wrote:
>>
>>
>> No, it does not. If you are working with counts, the appropriate model
>> would usually be Poisson regression. I.e. Generalized linear model with
>> log-link function and Possion probability family. I have seen many
>> examples of microbiologists using linear regression when they should
>> actually use Poisson regression (e.g. counting genes) or logistic
>> regression (e.g. dose-response and titration curves).
>>
>> This will do it for you:
>>
>> MATLAB: glmfit from the statistics toolbox
>> R: glm
>> SAS: PROC GLIM
>> Python: statmodels scikit
>>
>> Another example of inappropriate use of linear regression in
>> microbiology is the Lineweaver-Burk plot as substitute for non-linear
>> least-squares (usually Levenberg-Marquardt) to fit a Michelis-Menten
>> curve. Some microbiologists are bevare of this, but they seem to prefer
>> all sorts of ad hoc trickeries like linearizations and
>> variance-stabilizing transforms instead of "just doing it right".
>>
>> As for samples that are not independent, that will affect the final
>> likelihood. If you want to optimize the log-likelhood yourself, to
>> control for this, getting ML estimates by maximizing the log-likelhood
>> is easy with fmin_powell or fmin_bgfs from scipy.optimize. (Powell's
>> method does not even need the gradient.) And if you need the "p-value",
>> you can either use the likelihood ratio or Monte Carlo (e.g. permutation
>> test).
>>
>
> Sturla, could you be more specific here? I don't know much about
> (bio)statistics, but that doesn't mean I don't want to do the things right
> :). All I want to get out of this analysis is to be able to say whether the
> correlation between genome lengths and numbers of particular genes (which
> looks neat and obvious from the scatter plot) is statistically significant
> given that the data points are heavily phylogenetically biased. That's why I
> mentioned "p-values". Of course, I'm open to any better/more accurate way of
> getting there than initially planned.

Peter, Could you post a scatter plot of your data (with axis ticks and
labels) so we get an idea what your data looks like?

I have no idea at all about the bio topic.

Josef

>
>
>
>
>>
>>
>> Sturla
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list