[SciPy-User] kriging module

Gael Varoquaux gael.varoquaux@normalesup....
Sun Nov 21 03:07:18 CST 2010

On Sat, Nov 20, 2010 at 10:28:15PM -0600, Joe Kington wrote:
>    Not to be argumentative, but this is why it may not make a ton of sense to
>    wrap "kriging" into a module that implements more general Gaussian process
>    regression methods.

Well, that's a question of point of view. If you are trying to do a
package specific to geostatistics, than it may not make much sens.
However, I personnaly think that establishing barrier between fields with
different codes solving different variants of the same problem does not
help scientific and technical progress. On the other hand, it is clear
that people come in with different vocabularies and expectations, and
thus 'swiss-army-knife' codes may not do much good either.

We thought that a Gaussian process regression code could fit well in
scikit learn because it is a problem that is well identified by the
machine learning community and recieves on going research from this
community. As a result, such code can benefit from other algorithms
implemented in the scikit for instance to do sparse Gaussian process
regression, a technique which can make Gaussian process regression both
faster, and more stable on high-dimensional data.

>    People who are looking for a package to interpolate data using kriging are
>    going to expect to:
>    a) specify which type of covariance function they're using from a number
>    of commonly used ones,
>    b) fit this function from the observed data,
>    c) review the fit of this function and have manual control it function,
>    d) have a covariance function that varies depending on azimuth (Or at
>    least a way to test for the degree and direction of anisotropy in the
>    observed data and use this when interpolating),
>    d) use other related methods (such as co-kriging to incorporate multiple
>    variables, or stochastic simulation using the same covariance functions,
>    etc)
>    e) have lots of control over the search window used when interpolating
>    (which is a bit of a different topic)

Thanks a lot for the precisions, this is useful. I can see that to do
Kriging you are adding a set of assumptions to the Gaussian process
regression. Are you suggesting that it would be worth having separate
Kriging objects as sub classes of the GaussianProcess objects?

>    I'm not trying to say that it's a bad thing to combine similar code, just
>    be aware that the first thing that someone's going to think when they hear
>    "kriging" is "How do I build and fit a variogram with this module?".

Thank you. I was certainly not aware (I am certainly not a Kriging nor a
Gaussian Process expert). I am no clue what a variogram is. It does seem
that any code that wants to cater for 'Kriging' users will need some
Kriging-specific functionality.

If people are (still) interested in the effort underway in the
scikit-learn[*], it might be great to contribute a Kriging-specific
module that uses the more general-purpose Gaussian process code to
achieve what geostatisticians call Kriging. If there is some
freely-downloadable geostatistics data, it would be great to make an
example (similar to the one in PyMC) that ensures that comon tasks in
geostatistics can easily be done.

As a side note, now that I am having a closer look at the PyMC GP
documentation, there seems to be some really nice and fancy code in
there, and it is very well documented.


[*] https://github.com/scikit-learn/scikit-learn/pull/14

More information about the SciPy-User mailing list