[SciPy-dev] Implementing a distance matrix between two sets of vectors concept

David Cournapeau david@ar.media.kyoto-u.ac...
Tue Jul 3 01:16:38 CDT 2007


Hi,

    for my machine learning toolbox, I need the concept of distance 
matrix, that is for two sets of vectors v and u (N u and M v), of 
dimension d, I want to compute the matrix D such as d(i,j) = 
distance(v_i, u_j). This is easy to do in numpy, but for big datasets, 
this becomes difficult without a significance loss of efficiency or big 
memory consumption.
    So I am thinking about implementing it in C. I think the overall 
concept is useful for other people, so before implementing something, I 
was wondering if other people would need/use it, and what would they need:
    - several distance (Euclidian, Mahalanobis, etc...), which would be 
a separate object to handle different sets of parameters.
    - C Api ?
    - datatypes ? Layout ? Contiguity ?
    - handling Nan ?

     cheers,

    David


More information about the Scipy-dev mailing list