[SciPy-user] Combinatoric/Statistics: length of longest sequence

A. M. Archibald peridot.faceted at gmail.com
Mon Oct 30 03:04:59 CST 2006

On 30/10/06, Dr. Hans Georg Krauthaeuser <hgk at et.uni-magdeburg.de> wrote:
> A. M. Archibald wrote:
> > On 29/10/06, Dr. Hans Georg Krauthaeuser <hgk at et.uni-magdeburg.de> wrote:
> >> Dear All,
> >>
> >> I have a 2d-array A of p-values (coming from pairwise statistical tests
> >> of N datasets). N is typically in the range of 100-1000. Now, given a
> >> certain threshold, I want to find the longest subset S of range(N), so
> >> that A[i][j]>threshold for all i,j from S.
> >
> > It's not quite clear, here, whether you want the longest *contiguous*
> > subset ({2,3,4,5,6}) or the largest arbitrary subset ({2,17,21,22}).
> Oh well, sorry for being not precise enough. I'm searching for the
> largest arbitrary subset.

Ah, then you are definitely looking at an NP-complete problem:

They suggest a heuristic, but I don't know how effective it is likely
to be; it probably depends on the structure of your graph (matrix).
It's possible, though unlikely, that a more continuous version of your
problem (one that took into account the actual values of A[i][j], for
example) would be more tractable - in fact, the eigenvector
corresponding to the maximal eigenvalue of A should be a node
weighting which has larger values for the "more connected" nodes (in a
particular sense).

> Thanks a lot for your hints and for the example code. I will check what
> I can do with it. I'll be back when I have some results.

It's just a quick hack, and can probably be significantly improved,
but good luck.

A. M. Archibald

More information about the SciPy-user mailing list