[SciPy-user] [newbie] standardize a matrix
martigan at gmail.com
Tue Oct 11 08:04:47 CDT 2005
thanks for the reply.
Follwoing you advices I implemented this method inside one of my
#This method should standardize the matrix
def standardize_matrix(self, matrix, method='standard'):
if method == 'max':
max_x = max(matrix[:,1].flat)
divide(matrix[:,1], max_x, stdmatrix[:,1])
max_y = max(matrix[:,2].flat)
divide(matrix[:,2], max_y, stdmatrix[:,2])
if method == 'standard':
mean_x = mean(matrix[:,1].flat)
stddev_x = std(matrix[:,1].flat)
subtract(matrix[:,1], mean_x, stdmatrix[:,1])
divide(matrix[:,1], stddev_x, stdmatrix[:,1])
mean_y = mean(matrix[:,2].flat)
stddev_y = std(matrix[:,2].flat)
subtract(matrix[:,2], mean_y, stdmatrix[:,2])
divide(matrix[:,2], stddev_y, stdmatrix[:,2])
When I call it from the main it throws this error:
mean_x = mean(matrix[:,1].flat)
NameError: global name 'mean' is not defined
I tried to understand which package are you importing as I initially
thought that was the common scipy base. Apparently I was able to find
a mean() method only inside scipy.stats.stats. However I am not able
to import this. Could you please help me to understand how to call
this method mean() and std()?
I have a bit of confusion in mind between NumPy and SciPy because I
could not find information in the documentation for SciPy.
On Oct 11, 2005, at 10:37 , Noel O'Boyle wrote:
> On Tue, 2005-10-11 at 10:02 +0200, Mauro Cherubini wrote:
>> I have a symmetrical bi-dimensional array that contains distances
>> between a certain number of points. The matrix diagoal are all
>> because of course the distance of a point from self is zero.
> A distance matrix - PyChem is developing a number of methods to deal
> with distance matrices (and multivariate analysis in general). At the
> moment it is actively under development but I recommend you to forward
> your question on to the pychem users mailing list
>> I would love to standardize the matrix using one or all of these
>> a) divide each attribute distance value of a point by the maximum
>> observed absolute distance value. This should restrict the values to
>> lie between -1 and 1. Often the values are all positive, and thus,
>> all transformed values will lie between 0 and 1.
> (Shouldn't all distances be positive?)
>>>> a = array([[0.1,0.2],[0.5,0.6]] )
>>>> print a.flat
>>>> print max(a.flat)
>>>> print a
> array([[ 0.16666667, 0.33333333],
> [ 0.83333333, 1. ]])
>> b) for each distance value subtract off the mean of that distances
>> and then divide by the distances' standard deviation. If the
>> distances are normally distributed then most distance values will
>> between -1 and 1.
> You need to decide whether you want to include the diagonal
> elements. If
> so, then use mean(a.flat) and so on.
>> c) for each distance value subtract off the mean of the distances
>> divide by the distances absolute deviation. Typically most distance
>> values will lie between -1 and 1.
>> I looked in the SciPy documentation and what I understood is that I
>> can use an 'ufunc' to define one of these methods. Unfortunately my
>> knowledge of Python, Numeric and SciPy is very low, so I could not
>> figure out how. There are very few examples in the documentation at
>> the moment.
>> Can anyone point me to a possible implementation or where to look up?
>> Thanks a lot in advance
> SciPy-user mailing list
> SciPy-user at scipy.net
More information about the SciPy-user