[Numpy-discussion] Numpify this?

Robert Kern robert.kern@gmail....
Sun May 18 04:11:39 CDT 2008


On Sun, May 18, 2008 at 4:02 AM, Matt Crane <matt@snapbug.geek.nz> wrote:
> On Sun, May 18, 2008 at 8:52 PM, Robert Kern <robert.kern@gmail.com> wrote:
>> It depends on the sizes.
> The sizes could range from 3 to 240000 with an average of around 5500.

A 240000x240000 boolean matrix will probably be too slow.

>> Are there repeats?
> No, no repeats in the first column.

Great! So let's use searchsorted() to find potential indices where the
two first columns are equal. We pull out the values at those indices
and actually do the comparison to get a boolean mask where there is an
equality. Do both a.searchsorted(b) and b.searchsorted(a) to get the
appropriate masks on b and a respectively. The number of True elements
will be the same for both. Now just apply the masks to the second
columns.


In [20]: a = array([[2, 10], [4, 20], [6, 30], [8, 40], [10, 50]])

In [21]: b = array([[2, 60], [3, 70], [4, 80], [5, 90], [8, 100], [10, 110]])

In [22]: a[b[b[:,0].searchsorted(a[:,0]),0] == a[:,0], 1]
Out[22]: array([10, 20, 40, 50])

In [23]: b[a[a[:,0].searchsorted(b[:,0]),0] == b[:,0], 1]
Out[23]: array([ 60,  80, 100, 110])

In [24]: column_stack([Out[22], Out[23]])
Out[24]:
array([[ 10,  60],
       [ 20,  80],
       [ 40, 100],
       [ 50, 110]])

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco


More information about the Numpy-discussion mailing list