[Numpy-discussion] Condensing array...

Olivier Grisel olivier.grisel@ensta....
Fri Feb 25 04:52:29 CST 2011

2011/2/25 Gael Varoquaux <gael.varoquaux@normalesup.org>:
> On Fri, Feb 25, 2011 at 10:36:42AM +0100, Fred wrote:
>> I have a big array (44 GB) I want to decimate.
>> But this array has a lot of NaN (only 1/3 has value, in fact, so 2/3 of
>> NaN).
>> If I "basically" decimate it (a la NumPy, ie data[::nx, ::ny, ::nz], for
>> instance), the decimated array will also have a lot of NaN.
>> What I would like to have in one cell of the decimated array is the
>> nearest (for instance) value in the big array. This is what I call a
>> "condensated array".
> What exactly do you mean by 'decimating'. To me is seems that you are
> looking for matrix factorization or matrix completion techniques, which
> are trendy topics in machine learning currently.
> They however are a bit challenging, and I fear that you will have read
> the papers and do some implementation, unless you have a clear
> application in mind that enables for simple tricks to solve it.

Indeed the following paper by G. Martinsson from there is also a
section on matrix summarization:


The scikit-learn randomized SVD implementation is coming this paper.
It's pretty useful in practice.

http://twitter.com/ogrisel - http://github.com/ogrisel

More information about the NumPy-Discussion mailing list