[Numpy-discussion] Overlapping ranges

josef.pktd@gmai... josef.pktd@gmai...
Mon Mar 16 17:31:29 CDT 2009

```On Mon, Mar 16, 2009 at 5:29 PM, Robert Kern <robert.kern@gmail.com> wrote:
> 2009/3/16 Peter Saffrey <pzs@dcs.gla.ac.uk>:
>
>> At the moment, I'm using a fairly naive approach that finds roughly in the
>> genome (which gene) each point might be and then checking it against the
>> bins in that gene. If I split the problem into chromosomes, I feel sure
>> there must be some super-fast matrix approach I can apply using numpy, but
>> I'm struggling a bit. Can anybody suggest something?
>
> You probably need something algorithmically better, like interval
> trees. There are a couple of C/Python implementations floating around.
>

If I understand your problem correctly, then with a smaller scaled
problem something like this should work
{{{
import numpy as np

B = np.array([[1,3],[2,5],[7,10], [6,15],[14,20]]) # bins
P = np.c_[np.arange(1,16), 4+np.arange(1,16)]  # points

# if the bin ended before the start of the point interval,then it is discarded
# if the bin started after the end of the point interval, then it is discarded
print B
print P
print indices
}}}

However it creates a result matrix with dimension (number of points)
times (number of bins). If this doesn't fit into memory some looping
is necessary.

Tested on example only.

Josef
```