# [Numpy-discussion] Boolean arrays

Francesc Alted faltet@pytables....
Sat Aug 28 09:14:22 CDT 2010

```2010/8/27, Robert Kern <robert.kern@gmail.com>:
> [~]
> |2> def kern_in(x, valid):
> ..>     mask = np.zeros(x.shape, dtype=bool)
> ..>     for good in valid:
> ..>         mask |= (x == good)
> ..>
>
> [~]
> |6> ar = np.random.randint(100, size=1000000)
>
> [~]
> |7> valid = np.arange(0, 100, 5)
>
> [~]
> |8> %timeit kern_in(ar, valid)
> 10 loops, best of 3: 115 ms per loop
>
> [~]
> |9> %timeit np.in1d(ar, valid)
> 1 loops, best of 3: 279 ms per loop

Another possibility is to use numexpr.  On a machine with 2 x E5520
quad-core processors (i.e. a total of 8 physical cores and, with

In [1]: import numpy as np

In [2]: def kern_in(x, valid):
...:     for good in valid:
...:         mask |= (x == good)
...:

In [3]: ar = np.random.randint(100, size=10000000)

In [4]: valid = np.arange(0, 100, 5)

In [5]: timeit kern_in(ar, valid)
1 loops, best of 3: 1.21 s per loop

In [6]: sexpr = "|".join([ "(ar == %d)" % v for v in valid ])

In [7]: sexpr   # (ar == 0) | (ar == 1)  <==> (0,1) in ar
Out[7]: '(ar == 0)|(ar == 5)|(ar == 10)|(ar == 15)|(ar == 20)|(ar ==
25)|(ar == 30)|(ar == 35)|(ar == 40)|(ar == 45)|(ar == 50)|(ar ==
55)|(ar == 60)|(ar == 65)|(ar == 70)|(ar == 75)|(ar == 80)|(ar ==
85)|(ar == 90)|(ar == 95)'

In [9]: import numexpr as nx

In [10]: timeit nx.evaluate(sexpr)
10 loops, best of 3: 71.9 ms per loop

That's almost 17x of speed-up wrt to kern_in() function, but not all
is due to the use of the full 16 threads.  Using only one thread
gives:

In [12]: timeit nx.evaluate(sexpr)
1 loops, best of 3: 586 ms per loop

which is about 2x faster than kern_in() for this machine.

It is not always possible to use numexpr, but in this case it seems to
work pretty well.

--
Francesc Alted
```