[SciPy-User] Generalized least square on large dataset

Charles R Harris charlesr.harris@gmail....
Thu Mar 8 07:35:23 CST 2012

On Wed, Mar 7, 2012 at 9:25 PM, Peter Cimermančič <
peter.cimermancic@gmail.com> wrote:

> To describe my problem into more details, I have a list of ~1000 bacterial
> genome lengths and number of certain genes for each one of them. I'd like
> to see if there is any correlation between genome lengths and number of the
> genes. It may look like an easy linear regression problem; however, one has
> to be a bit more careful as the measurements aren't sampled independently.
> Bacteria, whose genomes are similar, tend to also contain similar number of
> the genes. Bacterial similarity is what is described with matrix V - it
> contains similarity values for each pair of bacteria, ranging from 0 to 1.
> Anybody encountered similar problem already?
Ah, that sounds like a fairly common sort of thing to deal with, separating
the effect of two variables, but it is out of the area of my experience.
The statisticians around here should be able to say something useful about


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20120308/c9f83f0a/attachment.html 

More information about the SciPy-User mailing list