[SciPy-User] scipy.stats: Sampling from an arbitrary probability distribution

Christoph Deil Deil.Christoph@googlemail....
Tue Jun 5 04:05:39 CDT 2012


On Jun 4, 2012, at 3:21 PM, Sturla Molden wrote:

> On 03.06.2012 13:20, Daniel Sabinasz wrote:
>> Hi all,
>> 
>> I need to sample a random number from a distribution whose probability
>> density function I specify myself. Is that possible using scipy.stats?
> 
> Sampling a general distribution is typically an MCMC problem, that e.g. 
> can be solved with the Metropolis-Hastings sampler.
> 
> http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm
> 
> Because of its recursive nature, a Markov chain like this is better 
> written in Cython, or you can use NumPy to run multiple chains in 
> parallel. (I depends on how many samples you need, of course, anything 
> below a million should be fast enough in Python.)
> 
> You might also take a look at PyMCMC:
> https://github.com/rdenham/pymcmc
> 
> 
> Sturla
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user


If you are willing to look outside of scipy, there are nice methods to generate random numbers from arbitrary distributions in ROOT,  a C++ physics data analysis package with python bindings:
http://root.cern.ch/drupal/content/pyroot

import ROOT
# Define the function and limits you want:
# TF1::TF1(const char* name, const char* formula, Double_t xmin = 0, Double_t xmax = 1)
f = ROOT.TF1("my_pdf", "x * x / 10.", -1, 1)
# Generate 100 random numbers from that distribution
[f.GetRandom() for _ in range(100)]

You can sample from an arbitrary 2D distribution as well:
# TF2::TF2(const char* name, const char* formula, Double_t xmin = 0, Double_t xmax = 1, Double_t ymin = 0, Double_t ymax = 1)
f2 = ROOT.TF2("my_pdf2", "x * x / 10. + pow(y, 4)", -1, 1, 3, 4)
x, y = ROOT.Double(), ROOT.Double()
f2.GetRandom2(x, y)

If you only want a histogram of values, not the array, you can avoid the python call overhead:
# TH1D::TH1D(const char* name, const char* title, Int_t nbinsx, Double_t xlow, Double_t xup)
h = ROOT.TH1D("my_hist", "my_hist", 1000, -1, 1)
# void TH1::FillRandom(const char* fname, Int_t ntimes = 5000)
In [49]: %timeit h.FillRandom("my_pdf", int(1e6))                                                                                                                             
10 loops, best of 3: 171 ms per loop
In [48]: %timeit [f.GetRandom() for _ in range(int(1e6))]                                                                                                                     
1 loops, best of 3: 2.62 s per loop

Here you can see the method used (parabolic approximations):
http://root.cern.ch/root/html/src/TF1.cxx.html#gYdi6C
Even if most users don't want to install ROOT, it might be worth comparing the accuracy / speed to the method in scipy.

ROOT also contains the UNURAN package, which implements several methods to sample from arbitrary one- or multi-dimensional distributions.
http://root.cern.ch/root/html/MATH_UNURAN_Index.html
http://statmath.wu.ac.at/unuran/
Unfortunately it's GPL and doesn't have python bindings itself as far as I know.

Christoph


More information about the SciPy-User mailing list