[Numpy-discussion] Overlapping ranges

josef.pktd@gmai... josef.pktd@gmai...
Mon Mar 16 17:31:29 CDT 2009

On Mon, Mar 16, 2009 at 5:29 PM, Robert Kern <robert.kern@gmail.com> wrote:
> 2009/3/16 Peter Saffrey <pzs@dcs.gla.ac.uk>:
>> At the moment, I'm using a fairly naive approach that finds roughly in the
>> genome (which gene) each point might be and then checking it against the
>> bins in that gene. If I split the problem into chromosomes, I feel sure
>> there must be some super-fast matrix approach I can apply using numpy, but
>> I'm struggling a bit. Can anybody suggest something?
> You probably need something algorithmically better, like interval
> trees. There are a couple of C/Python implementations floating around.

If I understand your problem correctly, then with a smaller scaled
problem something like this should work
import numpy as np

B = np.array([[1,3],[2,5],[7,10], [6,15],[14,20]]) # bins
P = np.c_[np.arange(1,16), 4+np.arange(1,16)]  # points

#mask = (~(P[:,0:1]>D[:,1:2].T)) * (~(P[:,1:2]<D[:,0:1].T))
# if the bin ended before the start of the point interval,then it is discarded
# if the bin started after the end of the point interval, then it is discarded
mask =  ~np.logical_or((P[:,0:1]>B[:,1:2].T), (P[:,1:2]<B[:,0:1].T))
indices = mask*np.arange(1,6)
print B
print P
print mask
print indices

However it creates a result matrix with dimension (number of points)
times (number of bins). If this doesn't fit into memory some looping
is necessary.

Tested on example only.


More information about the Numpy-discussion mailing list