[Numpy-discussion] newbie question about boolean testing of array equality result
Wed Mar 23 11:15:08 CDT 2011
On Wed, Mar 23, 2011 at 10:29 AM, Jonathan Hartley <email@example.com> wrote:
> Hey people,
> I'm writing an application in which we evaluate user-supplied Python
> Sometimes, we want to do an equality test on the result of the evaluation,
> result = eval(...)
> if result == float('nan'):
> If the result is a numpy array, then this raises a ValueError, since the
> array equality operator returns a new numpy array, and the coercion of this
> to a boolean for the if predicate then explicitly raises. Presumably this is
> well-known? For us, it is undesirable.
> Am I right to understand that any code which might ever encounter a numpy
> array therefore can never use an unguarded 'if x == y:' construction? Is my
> workaround really to replace every instance of this with 'if not
> isinstance(x, numpy.array) and x==y:' ? This pains me, because otherwise
> this code would have no dependency on numpy. (I can't just prepend
> 'isinstance(x, float)' because, unlike the example above, we don't always
> know the type of the equality RHS until runtime.)
With floating point numbers you usually don't want to do 'if x == y'
anyway - it's pretty common for two numbers that should be
mathematically the same to differ due to lack of precision, e.g. 0.3 +
0.2 + 0.1 != 0.3 + (0.2 + 0.1) because the first is
0.59999999999999998 and the second is 0.60000000000000009
Typically you want to compare them by checking that their difference
is below some threshold, e.g. abs(x-y) < 1e-10. The threshold should
depend on the precision: for 32-bit floats, 1e-10 isn't enough (e.g.
(np.float32(.3) - np.float32(.2) - np.float32(.1)) < 1e-10 fails). The
best threshold also depends on the magnitude of the values, since
precision errors are greater between large numbers than between
numbers close to zero.
The easiest way to do this is to just go ahead and depend on numpy,
because it provides a function allclose that does what you want -
np.allclose(x,y) is true if x and y are the same shape and their
elementwise differences are below a small threshold. Note that there
are additional arguments to allclose if you want to control the
threshold, but for simple cases allclose(x,y) is probably fine. Also
be aware that it does apply numpy broadcasting, so that e.g.
np.allclose(x,1) checks if all elements of x are close to 1.
You can also use numpy for nan testing: np.isnan(x).any() works if x
is a numpy array or if x is a python scalar, list, or tuple (or
anything else that can be passed into np.asarray).
Other useful checks include np.isneginf, np.isposinf, and np.isinf for
checking for infinities, and also np.isfinite which is equivalent to
~(np.isnan | np.isinf).
More information about the NumPy-Discussion