[Numpy-discussion] intersect1d and setmember1d
Neil
neilcrighton@gmail....
Sat Feb 28 07:56:42 CST 2009
mudit sharma <mudit_19a <at> yahoo.com> writes:
> intersect1d and setmember1d doesn't give expected results in case there are
duplicate values in either
> array becuase it works by sorting data and substracting previous value. Is
there an alternative in numpy
> to get indices of intersected values.
>
> In [31]: p nonzero(setmember1d(v1.Id, v2.Id))[0]
> [ 0 1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
<-------------- index 2 shouldn't be here look at the
> data below.
> 26 27 28 29]
>
> In [32]: p v1.Id[:10]
> [ 232. 232. 233. 233. 234. 234. 235. 235. 237. 237.]
>
> In [33]: p v2.Id[:10]
> [ 232. 232. 234. 234. 235. 235. 236. 236. 237. 237.]
>
As far as I know there isn't an obvious way to get the functionality of
setmember1d working on non-unique inputs. However, I've needed this operation
quite a lot, so here's a function I wrote that does it. It's only a few times
slower than numpy's setmember1d. You're welcome to use it.
import numpy as np
def ismember(a1,a2):
""" Test whether items from a2 are in a1.
This does the same thing as np.setmember1d, but works on
non-unique arrays.
Only a few (2-4) times slower than np.setmember1d, and a lot
faster than [i in a2 for i in a1].
An example that np.setmember1d gets wrong:
>>> a1 = np.array([5,4,5,3,4,4,3,4,3,5,2,1,5,5])
>>> a2 = [2,3,4]
>>> mask = ismember(a1,a2)
>>> a1[mask]
array([4, 3, 4, 4, 3, 4, 3, 2])
"""
a2 = set(a2)
a1 = np.asarray(a1)
ind = a1.argsort()
a1 = a1[ind]
mask = []
# need this bit because prev is not defined for first item
item = a1[0]
if item in a2:
mask.append(True)
a2.remove(item)
else:
mask.append(False)
prev = item
# main loop
for item in a1[1:]:
if item == prev:
mask.append(mask[-1])
elif item in a2:
mask.append(True)
prev = item
a2.remove(item)
else:
mask.append(False)
prev = item
# restore mask to original ordering of a1 and return
mask = np.array(mask)
return mask[ind.argsort()]
More information about the Numpy-discussion
mailing list