# [SciPy-user] [newbie] standardize a matrix

Mauro Cherubini martigan at gmail.com
Tue Oct 11 08:04:47 CDT 2005

```Hi Noel,
Follwoing you advices I implemented this method inside one of my
classes:

#This method should standardize the matrix
def standardize_matrix(self, matrix, method='standard'):
if method == 'max':
max_x = max(matrix[:,1].flat)
divide(matrix[:,1], max_x, stdmatrix[:,1])
max_y = max(matrix[:,2].flat)
divide(matrix[:,2], max_y, stdmatrix[:,2])
return stdmatrix;
if method == 'standard':
mean_x = mean(matrix[:,1].flat)
stddev_x = std(matrix[:,1].flat)
subtract(matrix[:,1], mean_x, stdmatrix[:,1])
divide(matrix[:,1], stddev_x, stdmatrix[:,1])
mean_y = mean(matrix[:,2].flat)
stddev_y = std(matrix[:,2].flat)
subtract(matrix[:,2], mean_y, stdmatrix[:,2])
divide(matrix[:,2], stddev_y, stdmatrix[:,2])
return stdmatrix;

When I call it from the main it throws this error:

mean_x = mean(matrix[:,1].flat)
NameError: global name 'mean' is not defined

I tried to understand which package are you importing as I initially
thought that was the common scipy base. Apparently I was able to find
a mean() method only inside scipy.stats.stats. However I am not able
this method mean() and std()?

I have a bit of confusion in mind between NumPy and SciPy because I
could not find information in the documentation for SciPy.

Thanks

Mauro

On Oct 11, 2005, at 10:37 , Noel O'Boyle wrote:

> On Tue, 2005-10-11 at 10:02 +0200, Mauro Cherubini wrote:
>
>> I have a symmetrical bi-dimensional array that contains distances
>> between a certain number of points.  The matrix diagoal are all
>> zeros
>> because of course the distance of a point from self is zero.
>>
>
> A distance matrix - PyChem is developing a number of methods to deal
> with distance matrices (and multivariate analysis in general). At the
> moment it is actively under development but I recommend you to forward
> your question on to the pychem users mailing list
> (www.sf.net/projects/pychem).
>
>
>> I would love to standardize the matrix using one or all of these
>> methods:
>>
>> a) divide each attribute distance value of a point by the maximum
>> observed absolute distance value. This should restrict the values to
>> lie between -1 and 1. Often the values are all positive, and thus,
>> all transformed values will lie between 0 and 1.
>>
>
> (Shouldn't all distances be positive?)
>
>
>>>> a = array([[0.1,0.2],[0.5,0.6]] )
>>>> print a.flat
>>>>
> [0.1,0.2,0.5,0.6]
>
>>>> print max(a.flat)
>>>>
> 0.6
>
>>>> divide(a,0.6,a)
>>>> print a
>>>>
> array([[ 0.16666667,  0.33333333],
>        [ 0.83333333,  1.        ]])
>
>
>
>> b) for each distance value subtract off the mean of that distances
>> and then divide by the distances' standard deviation. If the
>> distances are normally distributed then most distance values will
>> lie
>> between -1 and 1.
>>
>
> You need to decide whether you want to include the diagonal
> elements. If
> so, then use mean(a.flat) and so on.
>
>
>
>> c) for each distance value subtract off the mean of the distances
>> and
>> divide by the distances absolute deviation. Typically most distance
>> values will lie between -1 and 1.
>>
>> I looked in the SciPy documentation and what I understood is that I
>> can use an 'ufunc' to define one of these methods. Unfortunately my
>> knowledge of Python, Numeric and SciPy is very low, so I could not
>> figure out how. There are very few examples in the documentation at
>> the moment.
>> Can anyone point me to a possible implementation or where to look up?
>> Thanks a lot in advance
>>
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-user
>

```