[Numpy-discussion] Broadcasting question

Olivier Grisel olivier.grisel@ensta....
Thu Dec 4 11:59:19 CST 2008


2008/12/4 Charles R Harris <charlesr.harris@gmail.com>:
>
>
> On Thu, Dec 4, 2008 at 8:26 AM, Olivier Grisel <olivier.grisel@ensta.org>
> wrote:
>>
>> Hi list,
>>
>> Suppose I have array a with dimensions (d1, d3) and array b with
>> dimensions (d2, d3). I want to compute array c with dimensions (d1,
>> d2) holding the squared euclidian norms of vectors in a and b with
>> size d3.
>
> Just to clarify the problem a bit, it looks like you want to compute the
> squared euclidean distance between every vector in a and every vector in b,
> i.e., a distance matrix. Is that correct? Also, how big are d1,d2,d3?

I would target d1 >> d2 ~ d3 with d1 as large as possible to fit in
memory and d2 and d3 in the order of a couple hundreds or thousands
for a start.

> If you *are* looking to compute the distance matrix I suspect your end goal
> is something beyond that. Could you describe what you are trying to do?

My end goal it to compute the activation of an array of Radial Basis
Function units where the activation of unit with center b_j for data
vector a_i is given by:

f(a_i, b_j) = exp(-||a_i - bj|| ** 2 / (2 * sigma))

The end goal is to have building blocks of various parameterized array
of homogeneous units (linear, sigmoid and RBF) along with their
gradient in parameter space so as too build various machine learning
algorithms such as multi layer perceptrons with various training
strategies such as Stochastic Gradient Descent. That code might be
integrated into the Modular Data Processing (MPD toolkit) project [1]
at some point.

The current stat of the python code is here:

http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/scalar.py

You can find an SSE optimized C implementation wrapped with ctypes here:

http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/sse.py
http://www.bitbucket.org/ogrisel/oglab/src/186eab341408/simdkernel/src/simdkernel/sse.c

> It could be that scipy.spatial or scipy.cluster are what you should look at.

I'll have a look at those, thanks for the pointer.

[1] http://mdp-toolkit.sourceforge.net/

-- 
Olivier


More information about the Numpy-discussion mailing list