# [SciPy-Dev] boolean / real-value distance metrics

Jacob VanderPlas vanderplas@astro.washington....
Fri Jan 6 00:37:52 CST 2012

```Hi all,
I've been taking a closer look at the various metrics in
scipy.spatial.distance.  In particular, every metric designed for
boolean values behaves differently depending on whether the function is
used directly, or cdist/pdist is used (see the example below).
cdist/pdist first converts the float array to bool, then performs the
computation.  The calls to the metric functions work directly with the
floating point vectors and yield a different result.

I've poked around, and haven't found any documentation anywhere that
Is this a feature of scipy, or a bug?  Which behavior is correct in this
case?
Are these boolean metrics, when generalized to floating point, true
metrics?  That is, can it be shown that they satisfy the triangle equality?

I'd like to work on the documentation to make all of this more clear,
but I don't know where to start...  Thanks
Jake

Example code:

In [1]: from scipy.spatial.distance import cdist, yule

In [2]: import numpy as np

In [3]: np.random.seed(0)

In [4]: x = np.random.random(100)

In [5]: x[x>0.5] = 0 # set ~half the entries to zero

In [6]: y = np.random.random(100)

In [7]: y[y>0.5] = 0  # set half of entries to zero

In [8]: yule(x, y)  # direct computation: this does not convert to bool
Out[8]: 0.96988390020367443

In [9]: cdist([x], [y], 'yule')[0, 0]  # cdist computation: this does
convert to bool
Out[9]: 0.83211678832116787

```