[Numpy-discussion] Large symmetrical matrix

Peter Skomoroch peter.skomoroch@gmail....
Wed Jun 11 12:07:11 CDT 2008

You could do something like this, where MyDIstanceFunc is written with weave
and some intelligent caching.  This only computes the upper diagonal
elements in a sparse matrix:

self.NumDatapoints = len(self.DataPoints)
self.Distances = sparse.lil_matrix((self.NumDatapoints,self.NumDatapoints))

for index, x in enumerate(self.DataPoints):
    self.Distances[index,0:index] = array(map(lambda y:self.MyDistanceFunc(
x, y ), self.DataPoints[0:index] ))

If an approximation is ok, you can save space by only calculating entries
for random samples of the rows and doing a matrix factorization like NMF on
the matrix with missing elements.  Depending on the dataset and accuracy
needed, this can be a big space savings.

On Wed, Jun 11, 2008 at 8:55 AM, Charles R Harris <charlesr.harris@gmail.com>

> On Wed, Jun 11, 2008 at 9:33 AM, Simon Palmer <simon.palmer@gmail.com>
> wrote:
>> Pretty simple.  I don't do any transformations.  It is a euclidean
>> distance matrix between n vectors in my data space, so I use it for lookup
>> of minima and I recalculate portions of it during the processing of my
>> algorithm.
>> It is the single largest limitation of my code, both in terms of
>> performance and scalability.  A fast and efficient solution to this issue
>> would make a huge difference to me.
> So are you doing hierarchical clustering?
> Chuck
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

Peter N. Skomoroch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080611/f3ba554e/attachment.html 

More information about the Numpy-discussion mailing list