[SciPy-User] Generalized least square on large dataset
Sat Mar 10 09:05:58 CST 2012
Den 10.03.2012 14:57, skrev email@example.com:
> He explained the between sample correlation with the similarity (my
> analogy autocorrelation in time series, or spatial correlation).
Look at his attachment ives.tiff.
If the categories are known in advance (right panel in
ives.tiff), I think what he actually needs is computing
the likelihood ratio between the model
log(lambda) = b + b * genome_length
+ np.dot(b[2:N+1], group[0:N-1])
and a reduced model
log(lambda) = b + np.dot(b[1:N], group[0:N-1])
That is, adding genome length as a predictor should not
improve the fit given that bacterial groups are already in
If he does not have groups, but some sort of dendrogram
(left panel in ives.tiff), perhaps he could preprocess the
data by clustering the bacteria based on his dendrogram?
A full dendrogram (e.g. used as nested log-linear model)
would overfit the data and explain it perfectly. So adding
genome length would always give zero improvement. But if
the dendrogram can be reduced into a few descrete categories,
he could use a likelihood ratio test for the genome length.
More information about the SciPy-User