[Numpy-discussion] extract elements of an array that are contained in another array?
Thu Jun 4 09:50:18 CDT 2009
On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac <email@example.com> wrote:
>> On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac <firstname.lastname@example.org> wrote:
> On 6/4/2009 8:35 AM email@example.com apparently wrote:
>> If b is large this creates a huge intermediate array
> True enough, but one could then use fromiter:
> setb = set(b)
> itr = (ai for ai in a if ai in setb)
> out = np.fromiter(itr, dtype=a.dtype)
> I suspect (?) that b would have to be pretty
> big relative to a for the repeated testing
> to be more costly than sorting a.
I didn't look at this case very closely for speed, setmember1d and
setmember1d_nu return a boolean array, that can be used for indexing,
not the actual elements.
Your iterator is in python and could be pretty slow, but I only ran
the performance script attached to the ticket and the speed
differences for different ways of doing it were pretty big for large
> Or if a stable order is not important (I don't
> recall if the OP specified), one could just
> np.intersect1d(a, np.unique(b))
This requires that also `a` has only unique elements.
intersect1d_nu doesn't require unique elements.
> On a different note, I think a name change
> is needed for your function. (Compare
> intersect1d_nu to see the potential
> confusion. And btw, what is the use case
> for intersect1d, which gives neither a
> set intersection nor a multiset intersection?)
intersect1d gives set intersection if both arrays have only unique
elements (i.e. are sets).
I thought the naming is pretty clear:
intersect1d(a,b) set intersection if a and b with unique elements
intersect1d_nu(a,b) set intersection if a and b with non-unique elements
setmember1d(a,b) boolean index array for a of set intersection if a
and b with unique elements
setmember1d_nu(a,b) boolean index array for a of set intersection if
a and b with non-unique elements
The new docs http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
are a bit clearer.
However, I haven't used either of these functions much, and non of
them are *my* functions.
Of the arraysetops functions, I use unique1d most (because of the
I just keep track of these functions because of the use for
categorical and dummy variables.
> Alan Isaac
> Numpy-discussion mailing list
More information about the Numpy-discussion