[SciPy-user] [newbie] standardize a matrix

Noel O'Boyle noel.oboyle2 at mail.dcu.ie
Tue Oct 11 03:37:15 CDT 2005


On Tue, 2005-10-11 at 10:02 +0200, Mauro Cherubini wrote:
> I have a symmetrical bi-dimensional array that contains distances  
> between a certain number of points.  The matrix diagoal are all
> zeros  
> because of course the distance of a point from self is zero.

A distance matrix - PyChem is developing a number of methods to deal
with distance matrices (and multivariate analysis in general). At the
moment it is actively under development but I recommend you to forward
your question on to the pychem users mailing list
(www.sf.net/projects/pychem).

> I would love to standardize the matrix using one or all of these  
> methods:
> 
> a) divide each attribute distance value of a point by the maximum  
> observed absolute distance value. This should restrict the values to  
> lie between -1 and 1. Often the values are all positive, and thus,  
> all transformed values will lie between 0 and 1.

(Shouldn't all distances be positive?)

>>> a = array([[0.1,0.2],[0.5,0.6]] )
>>> print a.flat
[0.1,0.2,0.5,0.6]
>>> print max(a.flat)
0.6
>>> divide(a,0.6,a)
>>> print a
array([[ 0.16666667,  0.33333333],
       [ 0.83333333,  1.        ]])


> b) for each distance value subtract off the mean of that distances  
> and then divide by the distances' standard deviation. If the  
> distances are normally distributed then most distance values will
> lie  
> between -1 and 1.

You need to decide whether you want to include the diagonal elements. If
so, then use mean(a.flat) and so on.


> c) for each distance value subtract off the mean of the distances
> and  
> divide by the distances absolute deviation. Typically most distance  
> values will lie between -1 and 1.
> 
> I looked in the SciPy documentation and what I understood is that I  
> can use an 'ufunc' to define one of these methods. Unfortunately my  
> knowledge of Python, Numeric and SciPy is very low, so I could not  
> figure out how. There are very few examples in the documentation at  
> the moment.
> Can anyone point me to a possible implementation or where to look up?
> Thanks a lot in advance



More information about the SciPy-user mailing list