# [SciPy-user] [newbie] standardize a matrix

Noel O'Boyle noel.oboyle2 at mail.dcu.ie
Tue Oct 11 08:10:54 CDT 2005

```On Tue, 2005-10-11 at 15:04 +0200, Mauro Cherubini wrote:
> Hi Noel,
> Follwoing you advices I implemented this method inside one of my
> classes:
>
> #This method should standardize the matrix
>      def standardize_matrix(self, matrix, method='standard'):
>          if method == 'max':
>              max_x = max(matrix[:,1].flat)
>              divide(matrix[:,1], max_x, stdmatrix[:,1])
>              max_y = max(matrix[:,2].flat)
>              divide(matrix[:,2], max_y, stdmatrix[:,2])
>              return stdmatrix;
>          if method == 'standard':
>              mean_x = mean(matrix[:,1].flat)
>              stddev_x = std(matrix[:,1].flat)
>              subtract(matrix[:,1], mean_x, stdmatrix[:,1])
>              divide(matrix[:,1], stddev_x, stdmatrix[:,1])
>              mean_y = mean(matrix[:,2].flat)
>              stddev_y = std(matrix[:,2].flat)
>              subtract(matrix[:,2], mean_y, stdmatrix[:,2])
>              divide(matrix[:,2], stddev_y, stdmatrix[:,2])
>              return stdmatrix;
>
> When I call it from the main it throws this error:
>
>      mean_x = mean(matrix[:,1].flat)
> NameError: global name 'mean' is not defined
>
> I tried to understand which package are you importing as I initially
> thought that was the common scipy base. Apparently I was able to find
> a mean() method only inside scipy.stats.stats. However I am not able
> this method mean() and std()?
>
> I have a bit of confusion in mind between NumPy and SciPy because I
> could not find information in the documentation for SciPy.

from scipy import *
mean should then work (it's actually in stats but some of the stats
stuff is also imported into the scipy namespace)

>
> Thanks
>
>
> Mauro
>
>
> On Oct 11, 2005, at 10:37 , Noel O'Boyle wrote:
>
> > On Tue, 2005-10-11 at 10:02 +0200, Mauro Cherubini wrote:
> >
> >> I have a symmetrical bi-dimensional array that contains distances
> >> between a certain number of points.  The matrix diagoal are all
> >> zeros
> >> because of course the distance of a point from self is zero.
> >>
> >
> > A distance matrix - PyChem is developing a number of methods to deal
> > with distance matrices (and multivariate analysis in general). At the
> > moment it is actively under development but I recommend you to forward
> > your question on to the pychem users mailing list
> > (www.sf.net/projects/pychem).
> >
> >
> >> I would love to standardize the matrix using one or all of these
> >> methods:
> >>
> >> a) divide each attribute distance value of a point by the maximum
> >> observed absolute distance value. This should restrict the values to
> >> lie between -1 and 1. Often the values are all positive, and thus,
> >> all transformed values will lie between 0 and 1.
> >>
> >
> > (Shouldn't all distances be positive?)
> >
> >
> >>>> a = array([[0.1,0.2],[0.5,0.6]] )
> >>>> print a.flat
> >>>>
> > [0.1,0.2,0.5,0.6]
> >
> >>>> print max(a.flat)
> >>>>
> > 0.6
> >
> >>>> divide(a,0.6,a)
> >>>> print a
> >>>>
> > array([[ 0.16666667,  0.33333333],
> >        [ 0.83333333,  1.        ]])
> >
> >
> >
> >> b) for each distance value subtract off the mean of that distances
> >> and then divide by the distances' standard deviation. If the
> >> distances are normally distributed then most distance values will
> >> lie
> >> between -1 and 1.
> >>
> >
> > You need to decide whether you want to include the diagonal
> > elements. If
> > so, then use mean(a.flat) and so on.
> >
> >
> >
> >> c) for each distance value subtract off the mean of the distances
> >> and
> >> divide by the distances absolute deviation. Typically most distance
> >> values will lie between -1 and 1.
> >>
> >> I looked in the SciPy documentation and what I understood is that I
> >> can use an 'ufunc' to define one of these methods. Unfortunately my
> >> knowledge of Python, Numeric and SciPy is very low, so I could not
> >> figure out how. There are very few examples in the documentation at
> >> the moment.
> >> Can anyone point me to a possible implementation or where to look up?
> >> Thanks a lot in advance
> >>
> >
> > _______________________________________________
> > SciPy-user mailing list
> > SciPy-user at scipy.net
> > http://www.scipy.net/mailman/listinfo/scipy-user
> >
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-user

```