[SciPy-User] Generalized least square on large dataset
Charles R Harris
Thu Mar 8 07:35:23 CST 2012
On Wed, Mar 7, 2012 at 9:25 PM, Peter Cimermančič <
> To describe my problem into more details, I have a list of ~1000 bacterial
> genome lengths and number of certain genes for each one of them. I'd like
> to see if there is any correlation between genome lengths and number of the
> genes. It may look like an easy linear regression problem; however, one has
> to be a bit more careful as the measurements aren't sampled independently.
> Bacteria, whose genomes are similar, tend to also contain similar number of
> the genes. Bacterial similarity is what is described with matrix V - it
> contains similarity values for each pair of bacteria, ranging from 0 to 1.
> Anybody encountered similar problem already?
Ah, that sounds like a fairly common sort of thing to deal with, separating
the effect of two variables, but it is out of the area of my experience.
The statisticians around here should be able to say something useful about
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User