# [SciPy-User] 4-D gaussian mixture model.

Éric Depagne eric@depagne....
Fri Nov 26 12:13:27 CST 2010

```Le vendredi 26 novembre 2010 18:51:34, Gael Varoquaux a écrit :
> On Fri, Nov 26, 2010 at 05:00:10PM +0100, Éric Depagne wrote:
> > I have a set of data that are made of 4 parameters : x, y, dx and dy
> >
> > I'd like to classify this set  the following way : put together all
> > (x,y) that have similar (dx, dy).
>
> OK, so you have a learning task with a multivariate output, is that
> right?
>
yes.

> > I've had a look at Gaussian mixture models implementation in scikit, and
> > it seems to be what I need. But the examples i've found here :
> > http://scikit-learn.sourceforge.net/0.5/auto_examples/gmm/plot_gmm.html#
> > only fit y vs x.
>
> Yes, standard Gaussian mixture models do not model multivariate output.
>
ok.

> > In my case for instance, all my (x,y) would be in red, but some of the
> > (dx, dy) would point towards you, and some would point away from you,
> > and I'd like to sort the data according to this "parameter": the
> > pointing direction.
>
> Can you extract this 'parameter' that makes most sens in your context.
> This would make the problem much better posed, as the method would not
> have to learn the relevant structure of the output space.
>
I'm not sure I understand what you mean by "extract". I cannot treat this
parameter alone, without knowing what the other two are.

> > How can I modify the example so that it fits 2 dims, keeping the first
> > two as input ?
>
> You can't. Not with the Gaussian mixture models in the scikit.

ok.
>
> > And does it make sense to use this kind of method, my knowledge in
> > statistics is quite limited.
>
> I am not an expert in structured output learning, but I would say that
> GMM is probably not an excellent choice for that. On the other hand, if
> you are interested in a clustering method, all the methods I know work on
> non structured output. The GMM could probably be adapted from a
> theoretical sens to your problem, but that would mean redoing the
> probabilistic model and the update laws used in the computation.
>
> For structured ouptut, latent factor models that learn from both spaces,
> such as canonical correlation analysis, are well-posed. But you would
> need to formulate your problem in a way that fits in these frameworks.
>
> What is your end problem? Do you want to classify or cluster? Can you
> define the quantity that you are interested in?
I have a series of stars with their coordinates (x, y)  and their proper
motion (dx, dy).
My problem is to find stars that belong to the same clusters (astronomically
speaking) and to list stars that are in the same region that the clusters by
chance (because their motion put them here now)

If the stars are physically linked together, they will not only have
coordinates that are close, but also their proper motion will point towards
roughly the same point. If they are here by chance, they have the same
coordinates, but their proper motion will be different.

Éric.
>
> HTH,
>
> Gael
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

--
Un clavier azerty en vaut deux
----------------------------------------------------------
Éric Depagne                            eric@depagne.org
```