[Numpy-discussion] Large symmetrical matrix
Wed Jun 11 12:07:11 CDT 2008
You could do something like this, where MyDIstanceFunc is written with weave
and some intelligent caching. This only computes the upper diagonal
elements in a sparse matrix:
self.NumDatapoints = len(self.DataPoints)
self.Distances = sparse.lil_matrix((self.NumDatapoints,self.NumDatapoints))
for index, x in enumerate(self.DataPoints):
self.Distances[index,0:index] = array(map(lambda y:self.MyDistanceFunc(
x, y ), self.DataPoints[0:index] ))
If an approximation is ok, you can save space by only calculating entries
for random samples of the rows and doing a matrix factorization like NMF on
the matrix with missing elements. Depending on the dataset and accuracy
needed, this can be a big space savings.
On Wed, Jun 11, 2008 at 8:55 AM, Charles R Harris <email@example.com>
> On Wed, Jun 11, 2008 at 9:33 AM, Simon Palmer <firstname.lastname@example.org>
>> Pretty simple. I don't do any transformations. It is a euclidean
>> distance matrix between n vectors in my data space, so I use it for lookup
>> of minima and I recalculate portions of it during the processing of my
>> It is the single largest limitation of my code, both in terms of
>> performance and scalability. A fast and efficient solution to this issue
>> would make a huge difference to me.
> So are you doing hierarchical clustering?
> Numpy-discussion mailing list
Peter N. Skomoroch
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Numpy-discussion