[Numpy-discussion] Large symmetrical matrix

Peter Skomoroch peter.skomoroch@gmail....
Wed Jun 11 12:07:11 CDT 2008


You could do something like this, where MyDIstanceFunc is written with weave
and some intelligent caching.  This only computes the upper diagonal
elements in a sparse matrix:

self.NumDatapoints = len(self.DataPoints)
self.Distances = sparse.lil_matrix((self.NumDatapoints,self.NumDatapoints))

for index, x in enumerate(self.DataPoints):
    self.Distances[index,0:index] = array(map(lambda y:self.MyDistanceFunc(
x, y ), self.DataPoints[0:index] ))

If an approximation is ok, you can save space by only calculating entries
for random samples of the rows and doing a matrix factorization like NMF on
the matrix with missing elements.  Depending on the dataset and accuracy
needed, this can be a big space savings.





On Wed, Jun 11, 2008 at 8:55 AM, Charles R Harris <charlesr.harris@gmail.com>
wrote:

>
>
> On Wed, Jun 11, 2008 at 9:33 AM, Simon Palmer <simon.palmer@gmail.com>
> wrote:
>
>> Pretty simple.  I don't do any transformations.  It is a euclidean
>> distance matrix between n vectors in my data space, so I use it for lookup
>> of minima and I recalculate portions of it during the processing of my
>> algorithm.
>>
>> It is the single largest limitation of my code, both in terms of
>> performance and scalability.  A fast and efficient solution to this issue
>> would make a huge difference to me.
>>
>
> So are you doing hierarchical clustering?
>
> Chuck
>
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Peter N. Skomoroch
peter.skomoroch@gmail.com
http://www.datawrangling.com
http://del.icio.us/pskomoroch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080611/f3ba554e/attachment.html 


More information about the Numpy-discussion mailing list