[SciPy-user] [newbie] standardize a matrix
Noel O'Boyle
noel.oboyle2 at mail.dcu.ie
Tue Oct 11 03:37:15 CDT 2005
On Tue, 2005-10-11 at 10:02 +0200, Mauro Cherubini wrote:
> I have a symmetrical bi-dimensional array that contains distances
> between a certain number of points. The matrix diagoal are all
> zeros
> because of course the distance of a point from self is zero.
A distance matrix - PyChem is developing a number of methods to deal
with distance matrices (and multivariate analysis in general). At the
moment it is actively under development but I recommend you to forward
your question on to the pychem users mailing list
(www.sf.net/projects/pychem).
> I would love to standardize the matrix using one or all of these
> methods:
>
> a) divide each attribute distance value of a point by the maximum
> observed absolute distance value. This should restrict the values to
> lie between -1 and 1. Often the values are all positive, and thus,
> all transformed values will lie between 0 and 1.
(Shouldn't all distances be positive?)
>>> a = array([[0.1,0.2],[0.5,0.6]] )
>>> print a.flat
[0.1,0.2,0.5,0.6]
>>> print max(a.flat)
0.6
>>> divide(a,0.6,a)
>>> print a
array([[ 0.16666667, 0.33333333],
[ 0.83333333, 1. ]])
> b) for each distance value subtract off the mean of that distances
> and then divide by the distances' standard deviation. If the
> distances are normally distributed then most distance values will
> lie
> between -1 and 1.
You need to decide whether you want to include the diagonal elements. If
so, then use mean(a.flat) and so on.
> c) for each distance value subtract off the mean of the distances
> and
> divide by the distances absolute deviation. Typically most distance
> values will lie between -1 and 1.
>
> I looked in the SciPy documentation and what I understood is that I
> can use an 'ufunc' to define one of these methods. Unfortunately my
> knowledge of Python, Numeric and SciPy is very low, so I could not
> figure out how. There are very few examples in the documentation at
> the moment.
> Can anyone point me to a possible implementation or where to look up?
> Thanks a lot in advance
More information about the SciPy-user
mailing list