# [Numpy-discussion] NumPy-Discussion Digest, Vol 42, Issue 85

Kevin Dunn kgdunn@gmail....
Sun Mar 28 19:18:25 CDT 2010

On Sun, Mar 28, 2010 at 20:12, Kevin Dunn <kgdunn@gmail.com> wrote:
>> Date: Sun, 28 Mar 2010 00:24:01 +0000
>> From: Andrea Gavana <andrea.gavana@gmail.com>
>> Subject: [Numpy-discussion] Interpolation question
>> To: Discussion of Numerical Python <numpy-discussion@scipy.org>
>> Message-ID:
>>        <d5ff27201003271724o6c82ec75v225d819c84140b46@mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Hi All,
>>
>>    I have an interpolation problem and I am having some difficulties
>> in tackling it. I hope I can explain myself clearly enough.
>>
>> Basically, I have a whole bunch of 3D fluid flow simulations (close to
>> 1000), and they are a result of different combinations of parameters.
>> I was planning to use the Radial Basis Functions in scipy, but for the
>> moment let's assume, to simplify things, that I am dealing only with
>> one parameter (x). In 1000 simulations, this parameter x has 1000
>> values, obviously. The problem is, the outcome of every single
>> simulation is a vector of oil production over time (let's say 40
>> values per simulation, one per year), and I would like to be able to
>> interpolate my x parameter (1000 values) against all the simulations
>> (1000x40) and get an approximating function that, given another x
>> parameter (of size 1x1) will give me back an interpolated production
>> profile (of size 1x40).
>
> Andrea, may I suggest a different approach to RBF's.
>
> Realize that your vector of 40 values for each row in y are not
> independent of each other (they will be correlated).  First perform a
> principal component analysis on this 1000 x 40 matrix and reduce it
> down to a 1000 x A matrix, called your scores matrix, where A is the
> number of independent components. A is selected so that it adequately
> summarizes Y without over-fitting and you will find A << 40, maybe 2
> or 3. There are tools, such as cross-validation, that do this well
> enough.
>
> Then you can relate your single column of X to these independent
> column in A using a tool such as least squares: one least squares
> model per column in the scores matrix.  This works because each column
> in the score vector is independent (contains totally orthogonal
> information) to the others.  But I would be surprised if this works
> well enough, unless A = 1.
>
> But it sounds like your don't just have a single column in you
> X-variables (you hinted that the single column was just for
> simplification).  In that case, I would build a projection to latent
> structures model (PLS) model that builds a single latent-variable
> model that simultaneously models the X-matrix, the Y-matrix as well as
> providing the maximal covariance between these two matrices.

Ooops, that got sent before I was about to end by saying, that if you
need some references and an outline of code, then I can readily
provide these.

This is a standard problem with data from spectroscopic instruments
and with batch processes.  They produce hundreds, sometimes 1000's of
samples per row. PCA and PLS are very effective at summarizing these
down to a much smaller number of independent columns, very often just
a handful, and relating them (i.e. building a predictive model) to
other data matrices.

Kevin Dunn

>> Something along these lines:
>>
>> import numpy as np
>> from scipy.interpolate import Rbf
>>
>> # x.shape = (1000, 1)
>> # y.shape = (1000, 40)
>>
>> rbf = Rbf(x, y)
>>
>> # New result with xi.shape = (1, 1) --> fi.shape = (1, 40)
>> fi = rbf(xi)
>>
>>
>> Does anyone have a suggestion on how I could implement this? Sorry if
>> it sounds confused... Please feel free to correct any wrong
>> assumptions I have made, or to propose other approaches if you think
>> RBFs are not suitable for this kind of problems.
>>